From haosun at openjdk.org Tue Nov 1 06:16:21 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 1 Nov 2022 06:16:21 GMT Subject: RFR: 8293484: AArch64: Enable SHA512 intrinsic by default on supported hardware Message-ID: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. 1) tier1~3 passed without new failures. 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms ------------- Commit messages: - 8293484: AArch64: Enable SHA512 intrinsic by default on supported hardware Changes: https://git.openjdk.org/jdk/pull/10925/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10925&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293484 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10925.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10925/head:pull/10925 PR: https://git.openjdk.org/jdk/pull/10925 From luhenry at openjdk.org Tue Nov 1 06:16:27 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 1 Nov 2022 06:16:27 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: <yyuuqDrwV-WVFKe6dUzIp_Eo_z4m_GKQoYx-pJna3CM=.6ef3d30c-4b2a-4e8d-ba1d-3776e9eb8147@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> <ItgEUpQqe5_lJReITrZWgr5SbqUS1kAanJsyqzhzNuE=.2b5b0e1e-1dc4-4cf3-8d0e-a61d8a4c439c@github.com> <L2WAjZaNyLgTJp85sijndF0GLYMMAPObB3ncl3u-U6w=.b2d6a5e6-8aa0-4328-b11e-b258d78e9f4b@github.com> <yyuuqDrwV-WVFKe6dUzIp_Eo_z4m_GKQoYx-pJna3CM=.6ef3d30c-4b2a-4e8d-ba1d-3776e9eb8147@github.com> Message-ID: <3Ri8tG00c3_Ks_AKRvTYyBxa6UINFqjX24aRsKr43-M=.1a793566-8645-4f75-8c45-0a57d6551791@github.com> On Mon, 31 Oct 2022 22:06:20 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> No you don't need to, the vector loop can be calculated as: >> >> IntVector accumulation = IntVector.zero(INT_SPECIES); >> for (int i = 0; i < bound; i += INT_SPECIES.length()) { >> IntVector current = IntVector.load(INT_SPECIES, array, i); >> accumulation = accumulation.mul(31**(INT_SPECIES.length())).add(current); >> } >> return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 31**2, 31, 1).reduce(ADD); >> >> Each iteration only requires a multiplication and an addition. The weight of lanes can be calculated just before the reduction operation. > > Ok, I can try rewriting as @merykitty suggests and compare. I'm running out of time to spend on this right now, though, so I sort of hope we can do this experiment as a follow-up RFE. You?re right, we can go forward indeed. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From luhenry at openjdk.org Tue Nov 1 06:19:25 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 1 Nov 2022 06:19:25 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: <yyuuqDrwV-WVFKe6dUzIp_Eo_z4m_GKQoYx-pJna3CM=.6ef3d30c-4b2a-4e8d-ba1d-3776e9eb8147@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> <ItgEUpQqe5_lJReITrZWgr5SbqUS1kAanJsyqzhzNuE=.2b5b0e1e-1dc4-4cf3-8d0e-a61d8a4c439c@github.com> <L2WAjZaNyLgTJp85sijndF0GLYMMAPObB3ncl3u-U6w=.b2d6a5e6-8aa0-4328-b11e-b258d78e9f4b@github.com> <yyuuqDrwV-WVFKe6dUzIp_Eo_z4m_GKQoYx-pJna3CM=.6ef3d30c-4b2a-4e8d-ba1d-3776e9eb8147@github.com> Message-ID: <AIV-RBmlKxSSNFeQdyLsXmWWE3ePgAeR1ugrk5CS0Fs=.5f9e08ea-78ca-4e2d-b0b3-c27c27885b1b@github.com> On Mon, 31 Oct 2022 22:06:20 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> No you don't need to, the vector loop can be calculated as: >> >> IntVector accumulation = IntVector.zero(INT_SPECIES); >> for (int i = 0; i < bound; i += INT_SPECIES.length()) { >> IntVector current = IntVector.load(INT_SPECIES, array, i); >> accumulation = accumulation.mul(31**(INT_SPECIES.length())).add(current); >> } >> return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 31**2, 31, 1).reduce(ADD); >> >> Each iteration only requires a multiplication and an addition. The weight of lanes can be calculated just before the reduction operation. > > Ok, I can try rewriting as @merykitty suggests and compare. I'm running out of time to spend on this right now, though, so I sort of hope we can do this experiment as a follow-up RFE. @cl4es i can write the assembly and send it your way if you want ------------- PR: https://git.openjdk.org/jdk/pull/10847 From fyang at openjdk.org Tue Nov 1 06:58:30 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Nov 2022 06:58:30 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <1yV4wylyTjD5-MuT7o_IylkxgU_O3xryd3cRlULgdbY=.9b73b7e9-5de1-47dc-ba6f-9dc30b400a9e@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <1yV4wylyTjD5-MuT7o_IylkxgU_O3xryd3cRlULgdbY=.9b73b7e9-5de1-47dc-ba6f-9dc30b400a9e@github.com> Message-ID: <J4kgVWIrCihh44u0d9KOAK9UjgP5RFfoxKIXPr1aMUI=.112d58bd-e9e4-4396-b204-94db972c364c@github.com> On Mon, 31 Oct 2022 11:29:36 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > @RealFYang let me know what you think. Thanks! @luhenry : Sorry for late reply. I am looking this this now. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Tue Nov 1 08:41:21 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 1 Nov 2022 08:41:21 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <g-x7skL0Y9J8WTXL-orFbjQGepuqgI0_ST_duoSyrQg=.55fb8b7b-7bce-488a-a2cf-691c5adb4279@github.com> On Thu, 27 Oct 2022 15:18:02 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite @realfyang perfect, thank you ------------- PR: https://git.openjdk.org/jdk/pull/10884 From redestad at openjdk.org Tue Nov 1 09:01:30 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 1 Nov 2022 09:01:30 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: <AIV-RBmlKxSSNFeQdyLsXmWWE3ePgAeR1ugrk5CS0Fs=.5f9e08ea-78ca-4e2d-b0b3-c27c27885b1b@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> <ItgEUpQqe5_lJReITrZWgr5SbqUS1kAanJsyqzhzNuE=.2b5b0e1e-1dc4-4cf3-8d0e-a61d8a4c439c@github.com> <L2WAjZaNyLgTJp85sijndF0GLYMMAPObB3ncl3u-U6w=.b2d6a5e6-8aa0-4328-b11e-b258d78e9f4b@github.com> <yyuuqDrwV-WVFKe6dUzIp_Eo_z4m_GKQoYx-pJna3CM=.6ef3d30c-4b2a-4e8d-ba1d-3776e9eb8147@github.com> <AIV-RBmlKxSSNFeQdyLsXmWWE3ePgAeR1ugrk5CS0Fs=.5f9e08ea-78ca-4e2d-b0b3-c27c27885b1b@github.com> Message-ID: <bRVIvXxfYUaTgDESknvKXPWFApd6fKIWc-NogD69-Ro=.9724da0e-4a95-4856-a2bf-fe38e1389451@github.com> On Tue, 1 Nov 2022 06:17:16 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Ok, I can try rewriting as @merykitty suggests and compare. I'm running out of time to spend on this right now, though, so I sort of hope we can do this experiment as a follow-up RFE. > > @cl4es i can write the assembly and send it your way if you want @luhenry if you have some time to translate that snippet then feel free! ------------- PR: https://git.openjdk.org/jdk/pull/10847 From kvn at openjdk.org Tue Nov 1 17:46:43 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Nov 2022 17:46:43 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references In-Reply-To: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> Message-ID: <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> On Mon, 31 Oct 2022 14:40:32 GMT, Andrew Haley <aph at openjdk.org> wrote: > This patch fixes the remaining null pointer dereference bugs that I know of. > > For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. > > I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. > > I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. > > I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10920 From matsaave at openjdk.org Tue Nov 1 18:35:42 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 1 Nov 2022 18:35:42 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v6] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <zKINLOkBVpUwjlgj2aznBnSpZLI5z81EsUTC6xCvx-c=.40b5f108-8cb6-402f-92d0-5a6510271a3b@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Added gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/83a6cced..f2cc1104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=04-05 Stats: 157 lines in 4 files changed: 81 ins; 64 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From iklam at openjdk.org Tue Nov 1 21:08:26 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Nov 2022 21:08:26 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v6] In-Reply-To: <zKINLOkBVpUwjlgj2aznBnSpZLI5z81EsUTC6xCvx-c=.40b5f108-8cb6-402f-92d0-5a6510271a3b@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <zKINLOkBVpUwjlgj2aznBnSpZLI5z81EsUTC6xCvx-c=.40b5f108-8cb6-402f-92d0-5a6510271a3b@github.com> Message-ID: <zJscKEkFsKwwe19iTWRsXnzYbRFmgu53aY7WQS5fKi8=.11639379-71c2-408e-96c1-8ab3e3ed2fc3@github.com> On Tue, 1 Nov 2022 18:35:42 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 <Concat0.s/Ljava/lang/String;> >> 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> >> BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 <Concat0.d/Ljava/lang/String;> >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Added gtest Changes requested by iklam (Reviewer). src/hotspot/share/oops/cpCache.cpp line 639: > 637: st->print_cr(" - F2: [ " PTR_FORMAT "]", (intptr_t)_f2); > 638: st->print_cr(" - Method: " INTPTR_FORMAT " %s", p2i(m), m != nullptr ? m->external_name() : nullptr); > 639: st->print_cr(" - flag values: [%02x|0|0|%01x|%01x|%01x|%01x|0|%01x|%01x|00|00|%02x]", To be consistent with the other output, `method` should be lower case and should be preceded by a single space. src/hotspot/share/oops/cpCache.cpp line 643: > 641: is_forced_virtual(), is_final(), is_vfinal(), > 642: indy_resolution_failed(), parameter_size()); > 643: st->print_cr(" - tos: %s\n - local signature: %01x\n" The parameters should be aligned after the open parenthesis. Please fix the other cases as well. You should keep the formatting consistent with the rest of the file. st->print_cr(" - flag values: [%02x|0|0|%01x|%01x|%01x|%01x|0|%01x|%01x|00|00|%02x]", flag_state(), has_local_signature(), has_appendix(), is_forced_virtual(), is_final(), is_vfinal(), indy_resolution_failed(), parameter_size()); src/hotspot/share/oops/cpCache.cpp line 651: > 649: indy_resolution_failed(), parameter_size()); > 650: if ((bytecode_1() == Bytecodes::_invokehandle || > 651: bytecode_1() == Bytecodes::_invokedynamic)) { Should be: if (bytecode_1() == Bytecodes::_invokehandle || bytecode_1() == Bytecodes::_invokedynamic) { src/hotspot/share/oops/cpCache.cpp line 654: > 652: oop appendix = appendix_if_resolved(cph); > 653: if (m != nullptr) { > 654: st->print_cr(" Method%s: " INTPTR_FORMAT " %s.%s%s", The method is already printed above, so there's no need to print it again. The only difference between here and the above is the "(native)" part, which is unimportant and can be removed. test/hotspot/gtest/oops/test_cpCache_output.cpp line 3: > 1: /* > 2: * Copyright (c) 2016, 2022, Oracle and/or its affiliates. All rights reserved. > 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. Copyright should be 2022 only (the command is required after 2022). * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. test/hotspot/gtest/oops/test_cpCache_output.cpp line 66: > 64: ASSERT_TRUE(strstr(output, "volatile:") != NULL) << "must have volatile flag"; > 65: ASSERT_TRUE(strstr(output, "field index:") != NULL) << "must have field index"; > 66: } Need to add a newline at the end of the file. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From matsaave at openjdk.org Tue Nov 1 21:52:51 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 1 Nov 2022 21:52:51 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v7] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <d_XrnfLZDcM4iA3cFCVQavDi9hRJac2JK3zzyk2kCHQ=.af6d38b8-9934-4657-b6d8-f162f7696e94@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Improved formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/f2cc1104..03154267 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=05-06 Stats: 23 lines in 2 files changed: 0 ins; 7 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From iklam at openjdk.org Tue Nov 1 22:41:28 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Nov 2022 22:41:28 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v7] In-Reply-To: <d_XrnfLZDcM4iA3cFCVQavDi9hRJac2JK3zzyk2kCHQ=.af6d38b8-9934-4657-b6d8-f162f7696e94@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <d_XrnfLZDcM4iA3cFCVQavDi9hRJac2JK3zzyk2kCHQ=.af6d38b8-9934-4657-b6d8-f162f7696e94@github.com> Message-ID: <4LaSwEeg70cDs2SuUteybEKqCXdh9O9RsG4_YLB4fnw=.5e791725-934d-4ed5-8204-a48067951f2c@github.com> On Tue, 1 Nov 2022 21:52:51 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 <Concat0.s/Ljava/lang/String;> >> 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> >> BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 <Concat0.d/Ljava/lang/String;> >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved formatting LGTM. Just a minor nit. src/hotspot/share/oops/cpCache.cpp line 659: > 657: } > 658: } > 659: else { Minor nit: the else should be combined with the previous line: } else { ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/10860 From vlivanov at openjdk.org Tue Nov 1 23:34:34 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Nov 2022 23:34:34 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> Message-ID: <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> On Fri, 28 Oct 2022 20:39:44 GMT, vpaprotsk <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: > > invalidkeyexception and some review comments src/hotspot/cpu/x86/macroAssembler_x86.hpp line 970: > 968: > 969: void addmq(int disp, Register r1, Register r2); > 970: All Poly1305-related methods can be moved to `StubGenerator`. They are used solely during stub creation. src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 32: > 30: #include "macroAssembler_x86.hpp" > 31: > 32: #ifdef _LP64 You could rename the file to `macroAssembler_x86_64_poly.cpp` and get rid of `#ifdef _LP64`. Once you move the declarations to `StubGenerator`, it'll be `stubGenerator_x86_64_poly.cpp`. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: > 2000: } > 2001: > 2002: address StubGenerator::generate_poly1305_masksCP() { I suggest to turn it into a C++ literal constant and move the declaration next to `poly1305_process_blocks_avx512` where they are used. As an example, here's how it is handled in GHASH stubs: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp#L35 That would allow to avoid to simplify the code a bit (no need in `StubRoutines::x86::_poly1305_mask_addr`/`poly1305_mask_addr()` and no need to generate the constants during VM startup). You could split it into 3 constants, but then using a single base register (`polyCP`) won't work anymore. Thinking more about it, I'm not sure why you can't just do the split and use address literals instead to access individual constants (and repurpose `r13` to be used as a scratch register when RIP-relative addressing mode doesn't work). src/hotspot/share/runtime/globals.hpp line 241: > 239: "Use intrinsics for java.util.Base64") \ > 240: \ > 241: product(bool, UsePolyIntrinsics, false, \ I'm not a fan of introducing new flags for individual intrinsics (there's already `-XX:DisableIntrinsic=_name` specifically for that), but since we already have many, shouldn't it be declared as a diagnostic flag, at least? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 1 23:34:34 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Nov 2022 23:34:34 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <Cju0XRYyqdeVeZZEpttxF9eUX02aQ0UM3CfD5uIe0OI=.6057ecb9-c80f-4eef-8aae-9f8748bc8f6c@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <BKHPNZFMR60W5dFzqO3sps0PKpxX2ggEjnK6yqKT_RE=.e531ff46-43cb-4ed3-b3f9-103ab0c7a8e5@github.com> <Cju0XRYyqdeVeZZEpttxF9eUX02aQ0UM3CfD5uIe0OI=.6057ecb9-c80f-4eef-8aae-9f8748bc8f6c@github.com> Message-ID: <4HxTb1DtD6KeuYupOKf32GoQ7SV8_EjHcqfhiZhbLHM=.884e631a-1336-454d-aae1-06f85f784381@github.com> On Fri, 28 Oct 2022 20:19:35 GMT, vpaprotsk <duke at openjdk.org> wrote: > And just looking now on uops.info, they seem to have identical timings? Actual instruction being used (aligned vs unaligned versions) doesn't matter much here, because it's a dynamic property of the address being accessed: misaligned accesses that cross cache line boundary incur a penalty. Since the cache line size is 64 byte in size, every misaligned 512-bit access is penalized. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 1 23:51:27 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Nov 2022 23:51:27 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> Message-ID: <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> On Tue, 1 Nov 2022 23:17:46 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> invalidkeyexception and some review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: > >> 2000: } >> 2001: >> 2002: address StubGenerator::generate_poly1305_masksCP() { > > I suggest to turn it into a C++ literal constant and move the declaration next to `poly1305_process_blocks_avx512` where they are used. As an example, here's how it is handled in GHASH stubs: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp#L35 > > That would allow to avoid to simplify the code a bit (no need in `StubRoutines::x86::_poly1305_mask_addr`/`poly1305_mask_addr()` and no need to generate the constants during VM startup). > > You could split it into 3 constants, but then using a single base register (`polyCP`) won't work anymore. > Thinking more about it, I'm not sure why you can't just do the split and use address literals instead to access individual constants (and repurpose `r13` to be used as a scratch register when RIP-relative addressing mode doesn't work). The case of AES stubs may be even a better fit here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L47 It doesn't use/introduce any shared constants, so declaring a constant and a local accessor (to save on pointer to address casts at use sites) is enough. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Wed Nov 2 00:04:41 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Nov 2022 00:04:41 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references In-Reply-To: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> Message-ID: <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> On Mon, 31 Oct 2022 14:40:32 GMT, Andrew Haley <aph at openjdk.org> wrote: > This patch fixes the remaining null pointer dereference bugs that I know of. > > For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. > > I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. > > I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. > > I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? Minor comments/suggestions. Otherwise, looks good. src/hotspot/share/oops/instanceKlass.cpp line 390: > 388: // Record dependency to keep nest host from being unloaded before this class. > 389: ClassLoaderData* this_key = class_loader_data(); > 390: if (this_key != NULL) { The code assumes `this_key != NULL`. Do we need an assert/guarantee here? src/hotspot/share/opto/bytecodeInfo.cpp line 66: > 64: assert(!caller_jvms->should_reexecute(), "there should be no reexecute bytecode with inlining"); > 65: } > 66: assert(_caller_jvms == NULL I'd reshape the code and either get rid of `_caller_jvms` initialization on line 47 or replace it with `_caller_jvms(NULL),`. Then, I'd guard `_caller_jvms` initialization by `caller_jvms != NULL` and move the assert under the guard: if (caller_jvms != NULL) { // Keep a private copy of the caller_jvms: _caller_jvms = new (C) JVMState(caller_jvms->method(), caller_tree->caller_jvms()); _caller_jvms->set_bci(caller_jvms->bci()); assert(!caller_jvms->should_reexecute(), "there should be no reexecute bytecode with inlining"); assert(caller_jvms->same_calls_as(_caller_jvms), "consistent JVMS"); } Or introduce a helper method which does a shallow copy of `caller_jvms` as part of initializing store on line 47. src/hotspot/share/opto/node.hpp line 1528: > 1526: public: > 1527: Node_Array(Arena* a, uint max = OptoNodeListSize) : _a(a), _max(max) { > 1528: if (a != NULL) { Add `assert(a != NULL, "...")` here? ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10920 From duke at openjdk.org Wed Nov 2 02:38:26 2022 From: duke at openjdk.org (vpaprotsk) Date: Wed, 2 Nov 2022 02:38:26 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> Message-ID: <lCes6gQoLqfh4z11XRcIFER_tou6ul62vcVWjW8bTdY=.fc739f71-b362-4a03-8173-b36161610457@github.com> On Tue, 1 Nov 2022 23:49:17 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: >> >>> 2000: } >>> 2001: >>> 2002: address StubGenerator::generate_poly1305_masksCP() { >> >> I suggest to turn it into a C++ literal constant and move the declaration next to `poly1305_process_blocks_avx512` where they are used. As an example, here's how it is handled in GHASH stubs: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp#L35 >> >> That would allow to avoid to simplify the code a bit (no need in `StubRoutines::x86::_poly1305_mask_addr`/`poly1305_mask_addr()` and no need to generate the constants during VM startup). >> >> You could split it into 3 constants, but then using a single base register (`polyCP`) won't work anymore. >> Thinking more about it, I'm not sure why you can't just do the split and use address literals instead to access individual constants (and repurpose `r13` to be used as a scratch register when RIP-relative addressing mode doesn't work). > > The case of AES stubs may be even a better fit here: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L47 > > It doesn't use/introduce any shared constants, so declaring a constant and a local accessor (to save on pointer to address casts at use sites) is enough. I wonder if I can remove that function completely now.. Originally I kept those in memory, because I was rather tight on zmm registers (actually, all registers), and I could use the `Address` version of instructions to save a register.. But I had done a mayor cleanup on register allocation before pushing the PR, maybe there is room now. (But if we do want to bring back any of the optimizations I kept back, we would need those registers again.. but will see) PS: I am trying to address 10% degradation @jnimeh and I discussed above, will take a few days to implement the latest round. Apologies for the delay and appreciate the review! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jbhateja at openjdk.org Wed Nov 2 03:19:04 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Nov 2022 03:19:04 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <4HxTb1DtD6KeuYupOKf32GoQ7SV8_EjHcqfhiZhbLHM=.884e631a-1336-454d-aae1-06f85f784381@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <BKHPNZFMR60W5dFzqO3sps0PKpxX2ggEjnK6yqKT_RE=.e531ff46-43cb-4ed3-b3f9-103ab0c7a8e5@github.com> <Cju0XRYyqdeVeZZEpttxF9eUX02aQ0UM3CfD5uIe0OI=.6057ecb9-c80f-4eef-8aae-9f8748bc8f6c@github.com> <4HxTb1DtD6KeuYupOKf32GoQ7SV8_EjHcqfhiZhbLHM=.884e631a-1336-454d-aae1-06f85f784381@github.com> Message-ID: <mymD7nKP6xLz1XoansrlVbhxo6EK0Zefc5OJ_WFyf3g=.33270ab8-22af-4cc3-b0ca-eb364624925a@github.com> On Tue, 1 Nov 2022 23:04:45 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Hmm.. interesting. Is this for loading? `evmovdquq` vs `evmovdqaq`? I was actually looking at using evmovdqaq but there is no encoding for it yet (And just looking now on uops.info, they seem to have identical timings? perhaps their measurements are off..). There are quite a few optimizations I tried (and removed) here, but not this one.. >> >> Perhaps to have a record, while its relatively fresh in my mind.. since there is a 8-block (I deleted a 16-block vector multiply), one can have a peeled off version for just 256 as the minimum payload.. In that case we only need R^1..R^8, (not R^1..R^16). I also tried loop stride of 8 blocks instead of 16, but that gets quite bit slower (20ish%?).. There was also a version that did a much better interleaving of multiplication and loading of next message block into limbs.. There is potentially a better way to 'devolve' the vector loop at tail; ie. when 15-blocks are left, just do one more 8-block multiply, all the constants are already available.. >> >> I removed all of those eventually. Even then, the assembler code currently is already fairly complex. The extra pre-, post-processing and if cases, I was struggling to keep up myself. Maybe code cleanup would have helped, so it _is_ possible to bring some of that back in for extra 10+%? (There is a branch on my fork with that code) >> >> I guess that's my long way of saying 'I don't want to complicate the assembler loop'? > >> And just looking now on uops.info, they seem to have identical timings? > > Actual instruction being used (aligned vs unaligned versions) doesn't matter much here, because it's a dynamic property of the address being accessed: misaligned accesses that cross cache line boundary incur a penalty. Since the cache line size is 64 byte in size, every misaligned 512-bit access is penalized. I collected performance counters for the benchmark included with the patch and its showing around 30% of 64 byte loads were spanning across the cache line. Performance counter stats for 'java -jar target/benchmarks.jar -f 1 -wi 1 -i 2 -w 30 -p dataSize=8192': 122385646614 cycles 328096538160 instructions # 2.68 insn per cycle 64530343063 MEM_INST_RETIRED.ALL_LOADS 22900705491 MEM_INST_RETIRED.ALL_STORES 19815558484 MEM_INST_RETIRED.SPLIT_LOADS 701176106 MEM_INST_RETIRED.SPLIT_STORES Presence of scalar peel loop before the vector loop can save this penalty. We should also extend the scope of optimization (preferably in this PR or in subsequent one) to optimize [MAC computation routine accepting ByteBuffer.](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java#L116), ------------- PR: https://git.openjdk.org/jdk/pull/10582 From epeter at openjdk.org Wed Nov 2 05:44:08 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Nov 2022 05:44:08 GMT Subject: RFR: 8279913: obsolete ExtendedDTraceProbes Message-ID: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> Obsoleted ExtendedDTraceProbes. Removed all uses of the flag, it now shows this warning when used: `Ignoring option ExtendedDTraceProbes; support was removed in 20.0` Documentation was already changed in [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) Verified warning message in dtrace build, and regular build. Ran automatic regression tests. ------------- Commit messages: - remove flag from a test - 8279913: obsolete ExtendedDTraceProbes Changes: https://git.openjdk.org/jdk/pull/10930/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10930&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8279913 Stats: 39 lines in 5 files changed: 0 ins; 34 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10930.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10930/head:pull/10930 PR: https://git.openjdk.org/jdk/pull/10930 From fyang at openjdk.org Wed Nov 2 06:14:23 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 2 Nov 2022 06:14:23 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <JY7zsETBysMQYv8KisuIwcpKMmtzEEZuHueOtG0BAEI=.06d4e02c-8c5f-46b3-943d-35d3e707ddf2@github.com> On Thu, 27 Oct 2022 15:18:02 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite src/hotspot/cpu/riscv/riscv.ad line 5197: > 5195: ins_encode %{ > 5196: __ addi(t0, as_Register($mem$$base), $mem$$disp); > 5197: __ andi(t0, t0, ~(CacheLineSize - 1)); Do we really need to align to CacheLineSize here and in generate_prefetch? I didn't see this requirement from the official CMO specification [1]. Could you please confirm that? Thanks. [1] https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.1.pdf ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Wed Nov 2 08:11:27 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Nov 2022 08:11:27 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <JY7zsETBysMQYv8KisuIwcpKMmtzEEZuHueOtG0BAEI=.06d4e02c-8c5f-46b3-943d-35d3e707ddf2@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <JY7zsETBysMQYv8KisuIwcpKMmtzEEZuHueOtG0BAEI=.06d4e02c-8c5f-46b3-943d-35d3e707ddf2@github.com> Message-ID: <0_IAq_W24mlGdEQOSgPW0taFF0zEE00NSD6hPZunPmA=.f1bf03cf-4afa-4c93-bad7-c57c55631234@github.com> On Wed, 2 Nov 2022 06:11:33 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > src/hotspot/cpu/riscv/riscv.ad line 5197: > >> 5195: ins_encode %{ >> 5196: __ addi(t0, as_Register($mem$$base), $mem$$disp); >> 5197: __ andi(t0, t0, ~(CacheLineSize - 1)); > > Do we really need to align to CacheLineSize here and in generate_prefetch? > I didn't see this requirement from the official CMO specification [1]. Could you please confirm that? Thanks. > > [1] https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.1.pdf It's indeed not a requirement, let me remove that `andi`. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Wed Nov 2 08:19:41 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Nov 2022 08:19:41 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Remove uncessary cache line alignement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/aadfa41e..e968f716 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=00-01 Stats: 4 lines in 2 files changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From tschatzl at openjdk.org Wed Nov 2 08:24:33 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 2 Nov 2022 08:24:33 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v5] In-Reply-To: <E_mwRHEo8ShqwmIT0D5PRe29l8b0893IkuLNJJesMm4=.4d58066f-04ad-4250-93f2-e829bbaa827f@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> <E_mwRHEo8ShqwmIT0D5PRe29l8b0893IkuLNJJesMm4=.4d58066f-04ad-4250-93f2-e829bbaa827f@github.com> Message-ID: <0N_OOeZIQEcOLHCjYzubyYGQnxV4SVcw86wGsrUMUI8=.88faa38a-1785-4c1f-9793-ee47bd85e5cc@github.com> On Wed, 26 Oct 2022 16:27:31 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote: >> Hi, >> >> Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. >> >> Usecase is in parallelizing the merging of large remsets for G1. >> >> Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). >> >> Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. >> This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). >> >> This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. >> >> Testing: tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Robbin suggestion to use BucketsOperation Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10759 From thartmann at openjdk.org Wed Nov 2 08:55:27 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Nov 2022 08:55:27 GMT Subject: RFR: 8279913: obsolete ExtendedDTraceProbes In-Reply-To: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> References: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> Message-ID: <jOfxFEadQXT3IINQMr5s4-Ep_5KIePtl8ictr2Y8YN8=.2aa384be-ac4d-4d3f-be2b-e1f413b0d54f@github.com> On Tue, 1 Nov 2022 10:38:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > Obsoleted ExtendedDTraceProbes. > Removed all uses of the flag, it now shows this warning when used: > `Ignoring option ExtendedDTraceProbes; support was removed in 20.0` > > Documentation was already changed in [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) > > Verified warning message in dtrace build, and regular build. > Ran automatic regression tests. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10930 From fyang at openjdk.org Wed Nov 2 09:03:52 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 2 Nov 2022 09:03:52 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> Message-ID: <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> On Wed, 2 Nov 2022 08:19:41 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Remove uncessary cache line alignement src/hotspot/cpu/riscv/riscv.ad line 5196: > 5194: > 5195: ins_encode %{ > 5196: __ addi(t0, as_Register($mem$$base), $mem$$disp); This might be further improved as I see prefetch instructions can receive some immediate offset. src/hotspot/os_cpu/linux_riscv/prefetch_linux_riscv.inline.hpp line 36: > 34: (void (*)(const void*, intptr_t))StubRoutines::riscv::prefetch_r(); > 35: if (interval >= 0 && stub != NULL) { > 36: stub(loc, interval); I am not sure if it really worth it to call a stub for read / write here. It looks to me not a big issue for the case the stub tries to catch and resolve. And I see aarch64 simply plant a 'prfm' instruction for prefetching [1]. I guess we might can do the same? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/prefetch_linux_aarch64.inline.hpp#L34 ------------- PR: https://git.openjdk.org/jdk/pull/10884 From chagedorn at openjdk.org Wed Nov 2 09:07:24 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Nov 2022 09:07:24 GMT Subject: RFR: 8279913: obsolete ExtendedDTraceProbes In-Reply-To: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> References: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> Message-ID: <9x5VGMEpHBCE1Qos_Rssx6HURoHpwYbop6nogZ5nmw8=.679c707f-7373-4b39-a93e-1f9e960d6a33@github.com> On Tue, 1 Nov 2022 10:38:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > Obsoleted ExtendedDTraceProbes. > Removed all uses of the flag, it now shows this warning when used: > `Ignoring option ExtendedDTraceProbes; support was removed in 20.0` > > Documentation was already changed in [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) > > Verified warning message in dtrace build, and regular build. > Ran automatic regression tests. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10930 From chagedorn at openjdk.org Wed Nov 2 09:10:24 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Nov 2022 09:10:24 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser [v2] In-Reply-To: <FL0MQgxTaupc3G1cdAHieFGke_2C8BiZEISkMlxni1E=.0eb546d0-abf6-41ca-be34-853086279b0d@github.com> References: <KM1VpwIO1alK_UYBLfx8cbzrbluZlHn1vxc7UoKWbhU=.d178cab1-bd7b-4282-b1b3-835e0a739a56@github.com> <FL0MQgxTaupc3G1cdAHieFGke_2C8BiZEISkMlxni1E=.0eb546d0-abf6-41ca-be34-853086279b0d@github.com> Message-ID: <obIWT-uSED0tZVO6TX8fZKDUes8OMAY1eqYfcNCqWNk=.b0357a73-7014-4e3a-b46b-b4ce8bfd2678@github.com> On Wed, 19 Oct 2022 12:11:30 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> RISC-V generates debuginfo like >> >> >>> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so >> >> ... >> Length: 1756 >> Version: 2 >> Offset into .debug_info: 0x4bc5e9 >> Pointer Size: 8 >> Segment Size: 0 >> >> Address Length >> 0000000000344ece 0000000000004a2c >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 00000000003498fa 0000000000000016 >> 0000000000349910 0000000000000016 >> .... >> 000000000026d5b8 0000000000000b9a >> 000000000034a532 0000000000000628 >> 000000000034ab5a 00000000000002ac >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000034ae06 0000000000000bee >> 000000000034b9f4 0000000000000660 >> 000000000034c054 00000000000005aa >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 <= >> 000000000034c5fe 0000000000000af2 >> 000000000034d0f0 0000000000000f16 >> 000000000034e006 0000000000000b4a >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000026e152 000000000000000e >> 0000000000000000 0000000000000000 >> >> >> Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. >> >> Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] >> >> Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. >> Running a tier1 sanity test now. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa >> [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Add the assertion back Given that the change is small, I think it's okay. I've run some additional testing which looked good. ------------- PR: https://git.openjdk.org/jdk/pull/10758 From xlinzheng at openjdk.org Wed Nov 2 09:12:52 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 2 Nov 2022 09:12:52 GMT Subject: Integrated: 8295646: Ignore zero pairs in address descriptors read by dwarf parser In-Reply-To: <KM1VpwIO1alK_UYBLfx8cbzrbluZlHn1vxc7UoKWbhU=.d178cab1-bd7b-4282-b1b3-835e0a739a56@github.com> References: <KM1VpwIO1alK_UYBLfx8cbzrbluZlHn1vxc7UoKWbhU=.d178cab1-bd7b-4282-b1b3-835e0a739a56@github.com> Message-ID: <e-wazEe_-g3D7W9PEa43a3Z6b1rpYjtSNJgN9KOM6wE=.81be0c85-53bf-46e3-8521-0c670b47bdfa@github.com> On Wed, 19 Oct 2022 08:22:01 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > RISC-V generates debuginfo like > > >> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so > > ... > Length: 1756 > Version: 2 > Offset into .debug_info: 0x4bc5e9 > Pointer Size: 8 > Segment Size: 0 > > Address Length > 0000000000344ece 0000000000004a2c > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 00000000003498fa 0000000000000016 > 0000000000349910 0000000000000016 > .... > 000000000026d5b8 0000000000000b9a > 000000000034a532 0000000000000628 > 000000000034ab5a 00000000000002ac > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000034ae06 0000000000000bee > 000000000034b9f4 0000000000000660 > 000000000034c054 00000000000005aa > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 <= > 000000000034c5fe 0000000000000af2 > 000000000034d0f0 0000000000000f16 > 000000000034e006 0000000000000b4a > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000026e152 000000000000000e > 0000000000000000 0000000000000000 > > > Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. > > Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] > > Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. > Running a tier1 sanity test now. > > Thanks, > Xiaolin > > [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa > [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 This pull request has now been integrated. Changeset: 2634eff2 Author: Xiaolin Zheng <xlinzheng at openjdk.org> Committer: Christian Hagedorn <chagedorn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/2634eff24fde2760a72b607095412eef9955919e Stats: 16 lines in 2 files changed: 11 ins; 0 del; 5 mod 8295646: Ignore zero pairs in address descriptors read by dwarf parser Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/10758 From fjiang at openjdk.org Wed Nov 2 09:41:38 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Nov 2022 09:41:38 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> On Mon, 31 Oct 2022 12:41:28 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). Thanks for the great work! I have reviewed `cpu/riscv` part, and here are my comments. src/hotspot/cpu/riscv/frame_riscv.inline.hpp line 388: > 386: } > 387: if (is_upcall_stub_frame()) { > 388: return sender_for_upcall_stub_frame(map); looks like it's Foreign API related, do we need to introduce it in this pr? src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 558: > 556: > 557: class NativePostCallNop: public NativeInstruction { > 558: public: could you add some comments for NativePostCallNop just like aarch64 did? src/hotspot/cpu/riscv/riscv.ad line 2461: > 2459: __ bnez(flag, no_count); > 2460: > 2461: __ ld(tmp, Address(xthread, JavaThread::held_monitor_count_offset())); can we just use `incremnet(Address(xthread, JavaThread::held_monitor_count_offset()))` here? src/hotspot/cpu/riscv/riscv.ad line 2540: > 2538: __ bnez(flag, no_count); > 2539: > 2540: __ ld(tmp, Address(xthread, JavaThread::held_monitor_count_offset())); `increment(Address)` can do the same thing. src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 979: > 977: > 978: // Make sure the call is patchable > 979: __ align(NativeInstruction::instruction_size); alignment was also done in `emit_trampoline_stub`, do we still need this `align` before emitting a trampoline call? src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 1006: > 1004: > 1005: // Make sure the call is patchable > 1006: __ align(NativeInstruction::instruction_size); ditto src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 1030: > 1028: exception_offset = __ pc() - start; > 1029: { > 1030: __ mv(x9, x10); // save return value contaning the exception oop in callee-saved R9 maybe R9 -> x9 ? src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 1236: > 1234: guarantee(false, "Unknown Continuation native intrinsic"); > 1235: } > 1236: aarch64 has some assertions here, do we need this? #ifdef ASSERT if (method->is_continuation_enter_intrinsic()) { assert(interpreted_entry_offset != -1, "Must be set"); assert(exception_offset != -1, "Must be set"); } else { assert(interpreted_entry_offset == -1, "Must be unset"); assert(exception_offset == -1, "Must be unset"); } assert(frame_complete != -1, "Must be set"); assert(stack_slots != -1, "Must be set"); assert(vep_offset != -1, "Must be set"); #endif src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3879: > 3877: __ ld(c_rarg1, Address(fp, -1 * wordSize)); // return address > 3878: __ verify_oop(x10); > 3879: __ mv(x9, x10); // save return value contaning the exception oop in callee-saved R9 maybe R9 -> x9? ------------- PR: https://git.openjdk.org/jdk/pull/10917 From xlinzheng at openjdk.org Wed Nov 2 10:20:18 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 2 Nov 2022 10:20:18 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V In-Reply-To: <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> Message-ID: <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> On Wed, 2 Nov 2022 08:51:23 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 979: > >> 977: >> 978: // Make sure the call is patchable >> 979: __ align(NativeInstruction::instruction_size); > > alignment was also done in `emit_trampoline_stub`, do we still need this `align` before emitting a trampoline call? Maybe I can help to answer this question. This is an RVC-related change. We want the `call site` itself, which is the `jal` instruction, to be aligned to be patchable, for RVC can make it 2-byte aligned. [code seg] ... jal <trampoline start addr> <--- we are here, and want to force aligning this call site. ... [stub seg] ... <trampoline start addr>: auipc ld jalr [64-bit real address] <- this is certainly aligned to 8, as you've mentioned. ... ------------- PR: https://git.openjdk.org/jdk/pull/10917 From iwalulya at openjdk.org Wed Nov 2 11:39:29 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 2 Nov 2022 11:39:29 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v4] In-Reply-To: <OT6LDjba8lTVL2AqnNfQUIiTskpHOPLZrEO2dSFw6J0=.3725deee-aaa1-4dd0-98d4-7052f16600d2@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> <nteAER-qdJj4vHag2x4b8mnhV-we9eiwrYfEhhLmj5Q=.36680e80-b3e6-422c-80a1-bbdf496b8a69@github.com> <OT6LDjba8lTVL2AqnNfQUIiTskpHOPLZrEO2dSFw6J0=.3725deee-aaa1-4dd0-98d4-7052f16600d2@github.com> Message-ID: <1ozVmm0X1NMVAiNAd25OFOdbDnBWeMYm0KctETIM_a0=.e407fdd4-7190-44c2-9faf-2efca5b897b0@github.com> On Mon, 24 Oct 2022 06:55:26 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> make claim InternalTableClaimer method > > Hey! > > Have you looked in: > concurrentHashTableTasks.inline.hpp > It contains BucketsOperation which is a base to do segmented work (very similar to BucketsClaimer). > The segmented work may be done with multiple threads. Thanks @robehn and @tschatzl for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/10759 From iwalulya at openjdk.org Wed Nov 2 11:41:30 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 2 Nov 2022 11:41:30 GMT Subject: Integrated: 8233697: CHT: Iteration parallelization In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: <dHq1xKW_HlWi0oQeLPizMVPypkzpT8I_pxJXqZLmaVQ=.7463e044-3e52-4224-b9bb-2959ea41747b@github.com> On Wed, 19 Oct 2022 10:15:46 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 This pull request has now been integrated. Changeset: 1a58cb1c Author: Ivan Walulya <iwalulya at openjdk.org> URL: https://git.openjdk.org/jdk/commit/1a58cb1c023c876594e8a53d00703e564a922d36 Stats: 239 lines in 4 files changed: 202 ins; 27 del; 10 mod 8233697: CHT: Iteration parallelization Reviewed-by: tschatzl, rehn ------------- PR: https://git.openjdk.org/jdk/pull/10759 From luhenry at openjdk.org Wed Nov 2 13:22:22 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Nov 2022 13:22:22 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> Message-ID: <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> On Wed, 2 Nov 2022 08:52:51 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove uncessary cache line alignement > > src/hotspot/cpu/riscv/riscv.ad line 5196: > >> 5194: >> 5195: ins_encode %{ >> 5196: __ addi(t0, as_Register($mem$$base), $mem$$disp); > > This might be further improved as I see prefetch instructions can receive some immediate offset. The offset needs to be aligned on 32 bytes (the lower 5 bits must be zero). There is then no guarantee that `$mem$$base + ($mem$$disp & ~((1<<5)-1)` is still on the same cache line. It's then easier to do a prefetch of `base+disp` with `offset = 0`. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Wed Nov 2 13:34:27 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Nov 2022 13:34:27 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> Message-ID: <iYL4J-SreswC_vymBffMxyXojb3ckOrrLyYPR9FMUBQ=.41be2844-d732-4b32-8d7f-692b3f67cbf9@github.com> On Wed, 2 Nov 2022 09:00:24 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove uncessary cache line alignement > > src/hotspot/os_cpu/linux_riscv/prefetch_linux_riscv.inline.hpp line 36: > >> 34: (void (*)(const void*, intptr_t))StubRoutines::riscv::prefetch_r(); >> 35: if (interval >= 0 && stub != NULL) { >> 36: stub(loc, interval); > > I am not sure if it really worth it to call a stub for read / write here. It looks to me not a big issue for the case the stub tries to catch and resolve. And I see aarch64 simply plant a 'prfm' instruction for prefetching [1]. I guess we might can do the same? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/prefetch_linux_aarch64.inline.hpp#L34 We would need to check for `UseZicbop` in any case; the access to a global variable is then required. It would be the same issue as https://github.com/openjdk/jdk/pull/10884/files/e968f7164124dcf560807c9ff7765e6f82b64cdd#diff-e3c18b8b83898e82b5a3069319df6a47468e91cc2527bf065e704a685a20f26bR5196 without the stub. I've to admit that the `interval` naming here is confusing since no implementation ever uses it as an interval but alway as an offset. Also, the callers assume it to be an offset, like `ContiguousSpace::prepare_for_compaction` for example. If we are to simply ignore the interval parameter, then we can also remove all uses of `PrefetchCopyIntervalInBytes`, `PrefetchScanIntervalInBytes`, ------------- PR: https://git.openjdk.org/jdk/pull/10884 From stefank at openjdk.org Wed Nov 2 14:46:46 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Nov 2022 14:46:46 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps Message-ID: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. ------------- Commit messages: - Fix CHeapBitMap MEMFLAGS Changes: https://git.openjdk.org/jdk/pull/10948/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10948&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296231 Stats: 27 lines in 8 files changed: 13 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10948.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10948/head:pull/10948 PR: https://git.openjdk.org/jdk/pull/10948 From stuefe at openjdk.org Wed Nov 2 14:55:54 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Nov 2022 14:55:54 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <vCEceQTqQGWU2Va2hHmNRi9bgfdRDiNmZMQfdT_bWKU=.13cd0cf6-8ffd-46e0-83fd-e9f7a6dc9ff5@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. Looks reasonable. ------------- PR: https://git.openjdk.org/jdk/pull/10948 From stuefe at openjdk.org Wed Nov 2 14:59:13 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Nov 2022 14:59:13 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <UzAin3odGxyaz9-ebSXSRIQJLhB3Zk0MqHGKX5FjktM=.e2b10b4a-63db-4c74-9723-7974234bd9f2@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. I always wanted something like a thread-bound default Memflags, that could be set with something like MemFlagsMark. It would define the Memflags to be used for the extent if no explicit Memflags are given. This would make sense, especially for utility classes that are used on behalf of other code. For example, wherever we dive into Metaspace allocation, we set the default flag to mtMetaspace, which is probably a good default for all subsequent mallocs in this extent. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/10948 From ihse at openjdk.org Wed Nov 2 15:39:41 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 2 Nov 2022 15:39:41 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v4] In-Reply-To: <2jHTulm9tMRZEq_tFq7UMUJjWwIS7TIoOkfxjVsseG8=.e69ac9a2-cf39-4972-818a-77cdf0ed93d6@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> <2jHTulm9tMRZEq_tFq7UMUJjWwIS7TIoOkfxjVsseG8=.e69ac9a2-cf39-4972-818a-77cdf0ed93d6@github.com> Message-ID: <-lSQrZXaxtGjPL-LuHHvroS8HOTvVdWgAx4J2yIAm60=.e932a8fd-8218-4068-96f0-3df02cf1d508@github.com> On Mon, 17 Oct 2022 18:08:56 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix build failures Build changes are trivially good. I have not looked at hotspot source code changes. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.org/jdk/pull/10493 From coleenp at openjdk.org Wed Nov 2 16:16:26 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Nov 2022 16:16:26 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v4] In-Reply-To: <2jHTulm9tMRZEq_tFq7UMUJjWwIS7TIoOkfxjVsseG8=.e69ac9a2-cf39-4972-818a-77cdf0ed93d6@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> <2jHTulm9tMRZEq_tFq7UMUJjWwIS7TIoOkfxjVsseG8=.e69ac9a2-cf39-4972-818a-77cdf0ed93d6@github.com> Message-ID: <jYSBCqpCKMXkMQveFDcwGLvmgII6Rx6UbGRTCv0_8mo=.83fd6377-a70d-46cf-a9cc-082796e1481b@github.com> On Mon, 17 Oct 2022 18:08:56 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix build failures Seems ok. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10493 From coleenp at openjdk.org Wed Nov 2 16:47:01 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Nov 2022 16:47:01 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing Message-ID: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. Tested with tier1-6. ------------- Commit messages: - 8256072: Eliminate JVMTI tagmap rehashing Changes: https://git.openjdk.org/jdk/pull/10938/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8256072 Stats: 108 lines in 12 files changed: 10 ins; 93 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10938.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10938/head:pull/10938 PR: https://git.openjdk.org/jdk/pull/10938 From matsaave at openjdk.org Wed Nov 2 16:55:50 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 2 Nov 2022 16:55:50 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v8] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <o82QDeeioOhR4esSm67qc2TC3x2TooRLOe8_LssN1EI=.d59f2caf-8a05-4e68-9154-01707bb7f095@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed code formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/03154267..05495a94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From asmehra at redhat.com Wed Nov 2 19:33:08 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Wed, 2 Nov 2022 15:33:08 -0400 Subject: Uniform APIs for using archived heap regions Message-ID: <CAKt0pyQLge2-H9M0wPYRqsh3L6kmiVn8ghoRM2_MwVfe8kNRGA@mail.gmail.com> Hi, I have been working on adding support for CDS archived heap regions in Shenandoah GC, in the way similar to existing non-G1 GC policies like serial, parallel and epsilon, which "load" the archived heap regions from the CDS archive file into the heap, as against the G1 GC which "maps" the archived heap regions. But I soon realized that Shenandoah being a region based collector, is not well positioned to use the existing API [1] that allocates memory regions in the heap for the archived heap regions Since G1 is also a region based collector, I started looking at how G1 performs mapping of archived heap regions, and realized shenandoah can do the same as well. But the mechanism and the APIs used for mapping the heap regions in G1 GC are tightly coupled to G1's internal implementation. I felt there is significant room for improving the APIs used for mapping the heap regions, thereby reducing the complexity and the coupling between the CDS and the GC policy being used. While working on the new APIs, I realized I can also use them for non-G1 GC policies, in a manner similar to the APIs that "load" the archived heap regions. So now I have an implementation for mapping archived heap regions which uses uniform APIs across all existing GC policies that currently support use of archived heap regions. I am pretty confident I would be able to use the same set of APIs for Shenandoah GC, and hopefully it would work for ZGC as well (if anyone takes that up). I would be raising a PR pretty soon for feedback as basic testing completes. For those interested in looking at the code before, it is sitting in my branch [2], [1] https://github.com/openjdk/jdk/blob/f84b0ad07c73c305d21c71ec6b8195dc1ee31a3e/src/hotspot/share/gc/shared/collectedHeap.hpp#L518 [2] https://github.com/openjdk/jdk/compare/master...ashu-mehra:jdk:archived-heap-support-v2 Regards, Ashutosh Mehra -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221102/c41967a1/attachment.htm> From coleenp at openjdk.org Wed Nov 2 20:44:58 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Nov 2022 20:44:58 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v2] In-Reply-To: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Message-ID: <sXDcEoI8qXoC164lgpNUFoxM-mGdSGgxlBH0ydoGe58=.a47c80fa-14b6-499f-842d-282a3e9de287@github.com> > Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. > Tested with tier1-6. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into jvmti - 8256072: Eliminate JVMTI tagmap rehashing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10938/files - new: https://git.openjdk.org/jdk/pull/10938/files/29fb0c2f..e549dcb5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=00-01 Stats: 32521 lines in 114 files changed: 2966 ins; 29135 del; 420 mod Patch: https://git.openjdk.org/jdk/pull/10938.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10938/head:pull/10938 PR: https://git.openjdk.org/jdk/pull/10938 From kbarrett at openjdk.org Wed Nov 2 20:55:26 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Nov 2022 20:55:26 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v2] In-Reply-To: <sXDcEoI8qXoC164lgpNUFoxM-mGdSGgxlBH0ydoGe58=.a47c80fa-14b6-499f-842d-282a3e9de287@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <sXDcEoI8qXoC164lgpNUFoxM-mGdSGgxlBH0ydoGe58=.a47c80fa-14b6-499f-842d-282a3e9de287@github.com> Message-ID: <dvNAViCuPfeHoSKNYl0iGkJ-wnmj6o23nLfUcQdTsPs=.8d7e9cb8-c597-4dd4-9c91-05483d60c5e3@github.com> On Wed, 2 Nov 2022 20:44:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into jvmti > - 8256072: Eliminate JVMTI tagmap rehashing Looks good. Yay code deletion! I was particularly happy to see `set_needs_rehashing` removed from all the GCs. As a followup, I think `CollectedHeap::hash_code` is unused after this change. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Wed Nov 2 22:23:57 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Nov 2022 22:23:57 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Message-ID: <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> > Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. > Tested with tier1-6. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove now-unused function that I missed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10938/files - new: https://git.openjdk.org/jdk/pull/10938/files/e549dcb5..f214791d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=01-02 Stats: 20 lines in 6 files changed: 0 ins; 19 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10938.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10938/head:pull/10938 PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Wed Nov 2 22:25:21 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Nov 2022 22:25:21 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v2] In-Reply-To: <sXDcEoI8qXoC164lgpNUFoxM-mGdSGgxlBH0ydoGe58=.a47c80fa-14b6-499f-842d-282a3e9de287@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <sXDcEoI8qXoC164lgpNUFoxM-mGdSGgxlBH0ydoGe58=.a47c80fa-14b6-499f-842d-282a3e9de287@github.com> Message-ID: <PR7QoNhgTfRE7a6Zsyx5KxeSW12k-tIM0nsddLx3ASE=.65dcbaab-73a1-47e5-b9a5-c5ad40e59cab@github.com> On Wed, 2 Nov 2022 20:44:58 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into jvmti > - 8256072: Eliminate JVMTI tagmap rehashing Thanks for the code review Kim. I removed the function that you noticed is now unused. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From dlong at openjdk.org Wed Nov 2 22:26:15 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 2 Nov 2022 22:26:15 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> On Mon, 31 Oct 2022 12:41:28 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1053: > 1051: > 1052: intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); > 1053: assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 Are there any ports where (stack_frame_bottom - stack_frame_top) != fsize? It would be nice if we could use (stack_frame_bottom - stack_frame_top) for fsize and remove the platform-specific computation using frame::metadata_words above. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From dholmes at openjdk.org Wed Nov 2 22:50:59 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 2 Nov 2022 22:50:59 GMT Subject: RFR: 8296262: Remove dead code from InstanceKlass::signature_name() Message-ID: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> Trivial dead code removal. Testing: simple build. Thanks. ------------- Commit messages: - 8296262: Remove dead code from InstanceKlass::signature_name() Changes: https://git.openjdk.org/jdk/pull/10962/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10962&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296262 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10962.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10962/head:pull/10962 PR: https://git.openjdk.org/jdk/pull/10962 From iklam at openjdk.org Wed Nov 2 22:53:21 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Nov 2022 22:53:21 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v8] In-Reply-To: <o82QDeeioOhR4esSm67qc2TC3x2TooRLOe8_LssN1EI=.d59f2caf-8a05-4e68-9154-01707bb7f095@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <o82QDeeioOhR4esSm67qc2TC3x2TooRLOe8_LssN1EI=.d59f2caf-8a05-4e68-9154-01707bb7f095@github.com> Message-ID: <aUI98nAxbw8tc3rY3EKKCCY0uTvZsknQdatYKJrW5VY=.65e669a9-9f4c-482d-8290-e3dd8cf8fe2a@github.com> On Wed, 2 Nov 2022 16:55:50 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 <Concat0.s/Ljava/lang/String;> >> 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> >> BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 <Concat0.d/Ljava/lang/String;> >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed code formatting Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10860 From iklam at openjdk.org Wed Nov 2 23:19:19 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Nov 2022 23:19:19 GMT Subject: RFR: 8296262: Remove dead code from InstanceKlass::signature_name() In-Reply-To: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> References: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> Message-ID: <x0pDeLQk3wRVUvsyFoCVE-Rbtjk_5qow92axdkpPetc=.d8caae17-4e23-4dbc-8a33-78bc5d9cabe6@github.com> On Wed, 2 Nov 2022 22:41:45 GMT, David Holmes <dholmes at openjdk.org> wrote: > Trivial dead code removal. > > Testing: simple build. > > Thanks. Looks good and trivial. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/10962 From fyang at openjdk.org Thu Nov 3 01:53:22 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 01:53:22 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <6ip7zHZWudSYJgbIkyl3aQmNbY2tCXYJxF7C5V9kwpw=.6b63ea4e-0bab-4bed-a87a-6fb1054d3c03@github.com> > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10917/files - new: https://git.openjdk.org/jdk/pull/10917/files/37e1616d..36df84f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=00-01 Stats: 35 lines in 4 files changed: 20 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10917.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10917/head:pull/10917 PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Thu Nov 3 02:01:48 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 02:01:48 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> Message-ID: <zxmNlpgRn_KRzT57qV-l-Dlld1N-nNYweOqZkmYofLQ=.c7ef0419-a3be-4abc-a987-2d2f1ab7bb71@github.com> On Tue, 1 Nov 2022 14:00:36 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix > > src/hotspot/cpu/riscv/frame_riscv.inline.hpp line 388: > >> 386: } >> 387: if (is_upcall_stub_frame()) { >> 388: return sender_for_upcall_stub_frame(map); > > looks like it's Foreign API related, do we need to introduce it in this pr? Let's keep it there as Foreign-API RISC-V port is almost ready there. > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 558: > >> 556: >> 557: class NativePostCallNop: public NativeInstruction { >> 558: public: > > could you add some comments for NativePostCallNop just like aarch64 did? As I mentioned in PR description, implementation for Post-call NOPs optimization is not incorporated in this PR. We will add necessary comments when we add Post-call NOPs optimization in another separate PR. > src/hotspot/cpu/riscv/riscv.ad line 2461: > >> 2459: __ bnez(flag, no_count); >> 2460: >> 2461: __ ld(tmp, Address(xthread, JavaThread::held_monitor_count_offset())); > > can we just use `incremnet(Address(xthread, JavaThread::held_monitor_count_offset()))` here? Note that increment/decrement may clobber 't1' register in certain cases which will conflict with the "flag" which aliases 't1' here. It will be safer to use the 'tmp' register here in this context. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fjiang at openjdk.org Thu Nov 3 02:01:49 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 3 Nov 2022 02:01:49 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> Message-ID: <eJPi4KzoXKrXLVE8S-2QkihpRVrAjDwHJA-7A1urGRA=.1fadd116-4d74-49f9-a6a4-1edd1758da33@github.com> On Wed, 2 Nov 2022 10:13:36 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 979: >> >>> 977: >>> 978: // Make sure the call is patchable >>> 979: __ align(NativeInstruction::instruction_size); >> >> alignment was also done in `emit_trampoline_stub`, do we still need this `align` before emitting a trampoline call? > > Maybe I can help to answer this question. > > This is an RVC-related change. > We want the `call site` itself, which is the `jal` instruction, to be aligned to be patchable, for RVC can make it 2-byte aligned. > > [code seg] > ... > jal <trampoline start addr> <--- we are here, and want to force aligning this call site. > ... > > > [stub seg] > ... > <trampoline start addr>: > auipc > ld > jalr > [64-bit real address] <- this is certainly aligned to 8, as you've mentioned. > ... Looks reasonable. As you mentioned that it?s an RVC-related change, do we need this alignment when `UseRVC` is disabled? ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Thu Nov 3 02:04:59 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 02:04:59 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> Message-ID: <pMaK8wxNG0ThM6OQRofkFajW7esulINMvA3OCl7JcBs=.4b3865fc-dace-4857-981d-e618a601a830@github.com> On Wed, 2 Nov 2022 09:03:16 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix > > src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 1236: > >> 1234: guarantee(false, "Unknown Continuation native intrinsic"); >> 1235: } >> 1236: > > aarch64 has some assertions here, do we need this? > > #ifdef ASSERT > if (method->is_continuation_enter_intrinsic()) { > assert(interpreted_entry_offset != -1, "Must be set"); > assert(exception_offset != -1, "Must be set"); > } else { > assert(interpreted_entry_offset == -1, "Must be unset"); > assert(exception_offset == -1, "Must be unset"); > } > assert(frame_complete != -1, "Must be set"); > assert(stack_slots != -1, "Must be set"); > assert(vep_offset != -1, "Must be set"); > #endif Good catch. I think I missed this issue: "8293654: Improve SharedRuntime handling of continuation helper out-arguments". I have just added handling for RISC-V. I think this is kind of refactoring work and won't affect basic functionality. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Thu Nov 3 02:12:31 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 02:12:31 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <eJPi4KzoXKrXLVE8S-2QkihpRVrAjDwHJA-7A1urGRA=.1fadd116-4d74-49f9-a6a4-1edd1758da33@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> <eJPi4KzoXKrXLVE8S-2QkihpRVrAjDwHJA-7A1urGRA=.1fadd116-4d74-49f9-a6a4-1edd1758da33@github.com> Message-ID: <Zgo9ME_GefZzjvx0lvMwtw6oRDIg3OAqBS5-lBbkVbY=.dd8d8ff0-9c2f-42c9-88d6-51f8428aab02@github.com> On Thu, 3 Nov 2022 01:53:11 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Maybe I can help to answer this question. >> >> This is an RVC-related change. >> We want the `call site` itself, which is the `jal` instruction, to be aligned to be patchable, for RVC can make it 2-byte aligned. >> >> [code seg] >> ... >> jal <trampoline start addr> <--- we are here, and want to force aligning this call site. >> ... >> >> >> [stub seg] >> ... >> <trampoline start addr>: >> auipc >> ld >> jalr >> [64-bit real address] <- this is certainly aligned to 8, as you've mentioned. >> ... > > Looks reasonable. As you mentioned that it?s an RVC-related change, do we need this alignment when `UseRVC` is disabled? This is necessary to support for RISC-V RVC extension whether it is turned on or off. And I think we are ready to enable use of RVC by default when this hardware extension is available. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From dholmes at openjdk.org Thu Nov 3 02:28:29 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Nov 2022 02:28:29 GMT Subject: RFR: 8296262: Remove dead code from InstanceKlass::signature_name() In-Reply-To: <x0pDeLQk3wRVUvsyFoCVE-Rbtjk_5qow92axdkpPetc=.d8caae17-4e23-4dbc-8a33-78bc5d9cabe6@github.com> References: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> <x0pDeLQk3wRVUvsyFoCVE-Rbtjk_5qow92axdkpPetc=.d8caae17-4e23-4dbc-8a33-78bc5d9cabe6@github.com> Message-ID: <LajbsieIkTeakPDDExRdUtpL7Cyv3Jcq4MyzC2Jb5dU=.6d4ca17c-b8a6-4252-8aa5-1fb4edcb6636@github.com> On Wed, 2 Nov 2022 23:15:20 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Trivial dead code removal. >> >> Testing: simple build. >> >> Thanks. > > Looks good and trivial. Thanks @iklam ! ------------- PR: https://git.openjdk.org/jdk/pull/10962 From dholmes at openjdk.org Thu Nov 3 02:31:30 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Nov 2022 02:31:30 GMT Subject: Integrated: 8296262: Remove dead code from InstanceKlass::signature_name() In-Reply-To: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> References: <xM9Y0A0Dt9Zo8Nf2aeW1j3BvTO2p7FLNaGh3ken2CDA=.dbfb7202-a20c-4d0a-8264-d67eb42fc162@github.com> Message-ID: <ExCJG0YBuHcC3AthNSf9TUAXGEW-vniFGzexnFeyoz4=.df6d306d-0b2b-4eb3-aea0-96c1797fd1ac@github.com> On Wed, 2 Nov 2022 22:41:45 GMT, David Holmes <dholmes at openjdk.org> wrote: > Trivial dead code removal. > > Testing: simple build. > > Thanks. This pull request has now been integrated. Changeset: 13b20e0e Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/13b20e0e6dda5a05edb05212a5774a960ab0f03b Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod 8296262: Remove dead code from InstanceKlass::signature_name() Reviewed-by: iklam ------------- PR: https://git.openjdk.org/jdk/pull/10962 From fyang at openjdk.org Thu Nov 3 02:32:00 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 02:32:00 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8286301 - Fix - 8286301: JEP 425 to RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10917/files - new: https://git.openjdk.org/jdk/pull/10917/files/36df84f2..06302f9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=01-02 Stats: 44237 lines in 350 files changed: 11454 ins; 31094 del; 1689 mod Patch: https://git.openjdk.org/jdk/pull/10917.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10917/head:pull/10917 PR: https://git.openjdk.org/jdk/pull/10917 From fjiang at openjdk.org Thu Nov 3 02:32:03 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 3 Nov 2022 02:32:03 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <6ip7zHZWudSYJgbIkyl3aQmNbY2tCXYJxF7C5V9kwpw=.6b63ea4e-0bab-4bed-a87a-6fb1054d3c03@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <6ip7zHZWudSYJgbIkyl3aQmNbY2tCXYJxF7C5V9kwpw=.6b63ea4e-0bab-4bed-a87a-6fb1054d3c03@github.com> Message-ID: <FEcguXc2FfMemsGUtUOU6M6WlLZUcPCnUbwcxNEga80=.90c2b8a9-8a21-4495-bab8-9a95e087b2d3@github.com> On Thu, 3 Nov 2022 01:53:22 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix Change looks good, thanks. ------------- Marked as reviewed by fjiang (Author). PR: https://git.openjdk.org/jdk/pull/10917 From fjiang at openjdk.org Thu Nov 3 02:32:04 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 3 Nov 2022 02:32:04 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> Message-ID: <NpjU1efqLHZlGg0VlV1aClZWN71ZwwcW6RQ7xJzsJ_I=.0bf5a9f6-fec2-4189-ada9-717c8936bf64@github.com> On Thu, 3 Nov 2022 02:28:04 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8286301 > - Fix > - 8286301: JEP 425 to RISC-V Marked as reviewed by fjiang (Author). ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fjiang at openjdk.org Thu Nov 3 02:32:06 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 3 Nov 2022 02:32:06 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <Zgo9ME_GefZzjvx0lvMwtw6oRDIg3OAqBS5-lBbkVbY=.dd8d8ff0-9c2f-42c9-88d6-51f8428aab02@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> <eJPi4KzoXKrXLVE8S-2QkihpRVrAjDwHJA-7A1urGRA=.1fadd116-4d74-49f9-a6a4-1edd1758da33@github.com> <Zgo9ME_GefZzjvx0lvMwtw6oRDIg3OAqBS5-lBbkVbY=.dd8d8ff0-9c2f-42c9-88d6-51f8428aab02@github.com> Message-ID: <eMSDg0g2pzK1dgDqw62NsqukjUtrrG7odpHOV92MS48=.a07e78e8-c9e0-419c-8522-8043a8eb5ed0@github.com> On Thu, 3 Nov 2022 02:09:14 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Looks reasonable. As you mentioned that it?s an RVC-related change, do we need this alignment when `UseRVC` is disabled? > > This is necessary to support for RISC-V RVC extension whether it is turned on or off. And I think we are ready to enable use of RVC by default when this hardware extension is available. I see. If `UseRVC` is disabled, it will always be `NativeInstruction::instruction_size` aligned, and `align` will do nothing. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Thu Nov 3 02:32:06 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 02:32:06 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <eMSDg0g2pzK1dgDqw62NsqukjUtrrG7odpHOV92MS48=.a07e78e8-c9e0-419c-8522-8043a8eb5ed0@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <pDmiMhkcx-K_uDfK-d0_2uW4AxMVjng95vwQ8ez3H-M=.ff66f6d3-430a-421f-a5a2-0d9773c96eb5@github.com> <u86mMOShSd9Gt6XnSZ6TpRqreF6QgeMGo11wKXDH4CI=.a71a61f4-b80b-418e-9e82-2de77536ffb4@github.com> <eJPi4KzoXKrXLVE8S-2QkihpRVrAjDwHJA-7A1urGRA=.1fadd116-4d74-49f9-a6a4-1edd1758da33@github.com> <Zgo9ME_GefZzjvx0lvMwtw6oRDIg3OAqBS5-lBbkVbY=.dd8d8ff0-9c2f-42c9-88d6-51f8428aab02@github.com> <eMSDg0g2pzK1dgDqw62NsqukjUtrrG7odpHOV92MS48=.a07e78e8-c9e0-419c-8522-8043a8eb5ed0@github.com> Message-ID: <-ncbe4aqjKImEEcrS29fLEZam8yCX4giZhfDfAPXMDA=.e9d2d3ee-1523-4d14-bb27-82444cb080d4@github.com> On Thu, 3 Nov 2022 02:24:43 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> This is necessary to support for RISC-V RVC extension whether it is turned on or off. And I think we are ready to enable use of RVC by default when this hardware extension is available. > > I see. If `UseRVC` is disabled, it will always be `NativeInstruction::instruction_size` aligned, and `align` will do nothing. That's right. Thanks again! :-) ------------- PR: https://git.openjdk.org/jdk/pull/10917 From xlinzheng at openjdk.org Thu Nov 3 03:26:26 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 3 Nov 2022 03:26:26 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> Message-ID: <HICqWdpq2lRxtRIqivy6-z7mlpovRAq8OGdVRE6Q2Xc=.225fa231-798c-4dbf-b94b-01a7ee0a3084@github.com> On Thu, 3 Nov 2022 02:32:00 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8286301 > - Fix > - 8286301: JEP 425 to RISC-V Nice overall. Please let me approve this for you :-) ------------- Marked as reviewed by xlinzheng (no project role). PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Thu Nov 3 03:26:28 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 03:26:28 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> Message-ID: <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> On Wed, 2 Nov 2022 22:22:35 GMT, Dean Long <dlong at openjdk.org> wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into 8286301 >> - Fix >> - 8286301: JEP 425 to RISC-V > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1053: > >> 1051: >> 1052: intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); >> 1053: assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 > > Are there any ports where (stack_frame_bottom - stack_frame_top) != fsize? It would be nice if we could use (stack_frame_bottom - stack_frame_top) for fsize and remove the platform-specific computation using frame::metadata_words above. I think aarch64 and riscv are different from x86_64 here due to possible padding in the frame[1][2]. So if we modify this assertion like: diff --git a/src/hotspot/share/runtime/continuationFreezeThaw.cpp b/src/hotspot/share/runtime/continuationFreezeThaw.cpp index 2ef48618ccb..c8e88b67f94 100644 --- a/src/hotspot/share/runtime/continuationFreezeThaw.cpp +++ b/src/hotspot/share/runtime/continuationFreezeThaw.cpp @@ -1050,7 +1050,7 @@ NOINLINE freeze_result FreezeBase::recurse_freeze_interpreted_frame(frame& f, fr const int fsize = f.fp() + frame::metadata_words + locals - stack_frame_top; intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); - assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 + assert(stack_frame_bottom - stack_frame_top == fsize, ""); // == on x86 DEBUG_ONLY(verify_frame_top(f, stack_frame_top)); Then we will trigger assertion failure on linux-aarch64 running a simple virtual thread demo: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/realfyang/openjdk-jdk/src/hotspot/share/runtime/continuationFreezeThaw.cpp :1053), pid=2680946, tid=2680964 # Error: assert(stack_frame_bottom - stack_frame_top == fsize) failed # # JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.realfyang.open jdk-jdk) # Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x865c98] FreezeBase::recurse_freeze_interpreted_frame(frame&, frame&, int, bool)+ 0xb8 # [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/frame_aarch64.hpp#L62 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/frame_riscv.hpp#L62 ------------- PR: https://git.openjdk.org/jdk/pull/10917 From kbarrett at openjdk.org Thu Nov 3 03:44:47 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 3 Nov 2022 03:44:47 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> Message-ID: <BpBax5Kk6nSbNNjrG8Bwe56zE9tR1x43trGoSjStmDI=.a0ef75ca-13e8-44f1-83b2-ec8c6d85d4a4@github.com> On Wed, 2 Nov 2022 22:23:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove now-unused function that I missed. Looks good. Maybe just a little more deletion. src/hotspot/share/gc/z/zHeap.inline.hpp line 48: > 46: inline uint32_t ZHeap::hash_oop(uintptr_t addr) const { > 47: const uintptr_t offset = ZAddress::offset(addr); > 48: return ZHash::address_to_uint32(offset); I think removal of the call to `ZHash::address_to_uint32` means this file no longer needs to include zHash.inline.hpp. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10938 From yadongwang at openjdk.org Thu Nov 3 03:58:25 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Thu, 3 Nov 2022 03:58:25 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v2] In-Reply-To: <6ip7zHZWudSYJgbIkyl3aQmNbY2tCXYJxF7C5V9kwpw=.6b63ea4e-0bab-4bed-a87a-6fb1054d3c03@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <6ip7zHZWudSYJgbIkyl3aQmNbY2tCXYJxF7C5V9kwpw=.6b63ea4e-0bab-4bed-a87a-6fb1054d3c03@github.com> Message-ID: <Ua3-rVnQqg1RQoSBeFA4WE4uHFi-bVmxFzAuphL-vJU=.2b6d683c-4bc3-42b1-92d0-e1c83c2016b8@github.com> On Thu, 3 Nov 2022 01:53:22 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10917 From stuefe at openjdk.org Thu Nov 3 05:00:31 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 3 Nov 2022 05:00:31 GMT Subject: RFR: JDK-8293346: Add NoThreadCurrentMark to simulate non-attached threads [v2] In-Reply-To: <mH-ubm02SAsJR2tIWaimZ8BJurWhQTx6E6YGONMywGY=.40b28da5-2a3d-46bd-934b-7c0644e5b10e@github.com> References: <ztVVg9Ln8VAJt_JhNVsjLiiVKzVeL7EN0CcakIGsNjo=.ef57732a-9c12-4fb5-97e8-daf89c63daac@github.com> <mH-ubm02SAsJR2tIWaimZ8BJurWhQTx6E6YGONMywGY=.40b28da5-2a3d-46bd-934b-7c0644e5b10e@github.com> Message-ID: <-ZPCxcKv3-txEQdyt16_Ju_jmiB2T5E-9IEfe2C6Thg=.dd04586f-a605-4cf3-b144-51f7c41e37b8@github.com> On Thu, 22 Sep 2022 17:08:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> When working on [JDK-8293344](https://bugs.openjdk.org/browse/JDK-8293344), I needed a way to verify that I had removed all offending usages of ResourceArea memory inside an extend. >> >> For that, a NoThreadCurrentMark was useful. It temporarily sets the current Thread* to NULL for the extend. Any use of Thread::current below that mark will now crash or assert. >> >> This can be used for two things: >> - guard code that is supposed to be safe for non-attached threads (os::malloc, stack printing, UL etc) against accidental usage of Thread::current() (e.g. resource area allocation) >> - in gtests, simulate a non-attached thread to cover code that behaves differently with Thread::current()==NULL. >> >> This patch: >> >> - Introduces the debug-only `NoThreadCurrentMark` >> - Adds a gtest for it >> - Adds it to guard dwarf-parsing-based stack printing, which had not been safe due to use of ResourceArea, fixed with [JDK-8293344](https://bugs.openjdk.org/browse/JDK-8293344) >> - Adds it to `os::malloc()`, `os::realloc()` and `os::free()`, since those can certainly be called without a current thread. >> - Replaces the test-local mark in the SafeFetch gtests that did a similar thing >> - I may add and fix up other users later >> >> I also had to change Library-based TLS initialization such that it runs at C++ dynamic initialization time. That is because the `NoThreadCurrentMark` needs to set both Library-based TLS and C++ TLS slots, and having library-based TLS not available before VM initialization made the code too complex. This also means we can remove the explicit TLS initialization from create_vm. >> >> Finally, while I was here, I also fixed the error printing in library based TLS (since the pthread_key_xxx functions don't modify errno). > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - New solution > - Revert changes to TLS initialization I withdraw this since I don't have time to work on it. I think the idea was sound, but ultimately its not that important. ------------- PR: https://git.openjdk.org/jdk/pull/10178 From stuefe at openjdk.org Thu Nov 3 05:00:32 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 3 Nov 2022 05:00:32 GMT Subject: Withdrawn: JDK-8293346: Add NoThreadCurrentMark to simulate non-attached threads In-Reply-To: <ztVVg9Ln8VAJt_JhNVsjLiiVKzVeL7EN0CcakIGsNjo=.ef57732a-9c12-4fb5-97e8-daf89c63daac@github.com> References: <ztVVg9Ln8VAJt_JhNVsjLiiVKzVeL7EN0CcakIGsNjo=.ef57732a-9c12-4fb5-97e8-daf89c63daac@github.com> Message-ID: <QOflhLuUCcFheOW_2UZfCDyJn_o45S6Xu5eZaLnhIRk=.ddbfa4a8-2de3-4270-a564-aa1b5eb3ce8f@github.com> On Tue, 6 Sep 2022 06:37:35 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > When working on [JDK-8293344](https://bugs.openjdk.org/browse/JDK-8293344), I needed a way to verify that I had removed all offending usages of ResourceArea memory inside an extend. > > For that, a NoThreadCurrentMark was useful. It temporarily sets the current Thread* to NULL for the extend. Any use of Thread::current below that mark will now crash or assert. > > This can be used for two things: > - guard code that is supposed to be safe for non-attached threads (os::malloc, stack printing, UL etc) against accidental usage of Thread::current() (e.g. resource area allocation) > - in gtests, simulate a non-attached thread to cover code that behaves differently with Thread::current()==NULL. > > This patch: > > - Introduces the debug-only `NoThreadCurrentMark` > - Adds a gtest for it > - Adds it to guard dwarf-parsing-based stack printing, which had not been safe due to use of ResourceArea, fixed with [JDK-8293344](https://bugs.openjdk.org/browse/JDK-8293344) > - Adds it to `os::malloc()`, `os::realloc()` and `os::free()`, since those can certainly be called without a current thread. > - Replaces the test-local mark in the SafeFetch gtests that did a similar thing > - I may add and fix up other users later > > I also had to change Library-based TLS initialization such that it runs at C++ dynamic initialization time. That is because the `NoThreadCurrentMark` needs to set both Library-based TLS and C++ TLS slots, and having library-based TLS not available before VM initialization made the code too complex. This also means we can remove the explicit TLS initialization from create_vm. > > Finally, while I was here, I also fixed the error printing in library based TLS (since the pthread_key_xxx functions don't modify errno). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10178 From eosterlund at openjdk.org Thu Nov 3 05:19:13 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Nov 2022 05:19:13 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <BpBax5Kk6nSbNNjrG8Bwe56zE9tR1x43trGoSjStmDI=.a0ef75ca-13e8-44f1-83b2-ec8c6d85d4a4@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> <BpBax5Kk6nSbNNjrG8Bwe56zE9tR1x43trGoSjStmDI=.a0ef75ca-13e8-44f1-83b2-ec8c6d85d4a4@github.com> Message-ID: <fZaV6iW2p_wPAeMUXrLfaxTuIlS-9g6dRx7ravAnKWM=.ead99bb0-1c45-4010-858d-797845f5a48b@github.com> On Thu, 3 Nov 2022 03:41:26 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove now-unused function that I missed. > > src/hotspot/share/gc/z/zHeap.inline.hpp line 48: > >> 46: inline uint32_t ZHeap::hash_oop(uintptr_t addr) const { >> 47: const uintptr_t offset = ZAddress::offset(addr); >> 48: return ZHash::address_to_uint32(offset); > > I think removal of the call to `ZHash::address_to_uint32` means this file no longer needs to include zHash.inline.hpp. I think you are right Kim. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From eosterlund at openjdk.org Thu Nov 3 05:31:41 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Nov 2022 05:31:41 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> Message-ID: <YRt5YwleIf5ktBSxk6Rhv-cJOReHEaJjsOJ_ZEM7SY0=.12563577-76fa-473b-a7bf-1b94e4d38b5a@github.com> On Wed, 2 Nov 2022 22:23:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove now-unused function that I missed. I love the use of identity hash code instead of address bits. There might be an issue with displaced markWords though where we need to be careful. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 116: > 114: > 115: JvmtiTagMapEntry* JvmtiTagMapTable::find(oop obj) { > 116: if (obj->has_no_hash()) { This new function you added checks if the markWord has a hashCode. If there is a displaced markWord, then it very well might be that there is a hashCode, but it is in the displaced markWord - either in a stack lock or an ObjectMonitor. Bailing here does not seem correct, as it might actually be in the table even if there is no hashCode in the markWord. Is this an optimization? ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10938 From fyang at openjdk.org Thu Nov 3 06:59:28 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 06:59:28 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <SmapI7Fdi1koj18uHL6gdma4IxhmBYa3ZAMD-z0rJg4=.4502a5b5-b177-4200-af99-484e52fd85a9@github.com> Message-ID: <lLGu_SLFxFehmpzUVTTYHUK41Shtf8twJx_wLnWNk08=.12c69503-8083-4341-bfa4-33d52236612b@github.com> On Thu, 3 Nov 2022 02:32:00 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8286301 > - Fix > - 8286301: JEP 425 to RISC-V Thanks all for looking at this non-trivial change! @shipilev : Want to take a look? ------------- PR: https://git.openjdk.org/jdk/pull/10917 From epeter at openjdk.org Thu Nov 3 07:13:31 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Nov 2022 07:13:31 GMT Subject: RFR: 8279913: obsolete ExtendedDTraceProbes In-Reply-To: <jOfxFEadQXT3IINQMr5s4-Ep_5KIePtl8ictr2Y8YN8=.2aa384be-ac4d-4d3f-be2b-e1f413b0d54f@github.com> References: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> <jOfxFEadQXT3IINQMr5s4-Ep_5KIePtl8ictr2Y8YN8=.2aa384be-ac4d-4d3f-be2b-e1f413b0d54f@github.com> Message-ID: <gkxxU_SnB-kxDV3A5u1OPVPZ8WYM5NEPqKrfA1mMsMo=.fe0397b6-9177-4472-810e-1461fd2bf0d6@github.com> On Wed, 2 Nov 2022 08:51:34 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> Obsoleted ExtendedDTraceProbes. >> Removed all uses of the flag, it now shows this warning when used: >> `Ignoring option ExtendedDTraceProbes; support was removed in 20.0` >> >> Documentation was already changed in [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) >> >> Verified warning message in dtrace build, and regular build. >> Ran automatic regression tests. > > Looks good to me. Thanks @TobiHartmann and @chhagedorn for the reviews :) ------------- PR: https://git.openjdk.org/jdk/pull/10930 From epeter at openjdk.org Thu Nov 3 07:13:31 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Nov 2022 07:13:31 GMT Subject: Integrated: 8279913: obsolete ExtendedDTraceProbes In-Reply-To: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> References: <MRTShFTi5PeLO9VkJBYdMQRrwQRDhuiCY8U6-g0E-Ww=.0b0bb1d7-1edf-47af-8e53-4dbac100abc2@github.com> Message-ID: <UgMBCPG92eNrLJvZrJhsHCjS0tKgc5TeKiQNJdC_alM=.6037ed2e-4804-4ec2-be21-0fa8bb4f56fd@github.com> On Tue, 1 Nov 2022 10:38:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > Obsoleted ExtendedDTraceProbes. > Removed all uses of the flag, it now shows this warning when used: > `Ignoring option ExtendedDTraceProbes; support was removed in 20.0` > > Documentation was already changed in [JDK-8279047](https://bugs.openjdk.org/browse/JDK-8279047) > > Verified warning message in dtrace build, and regular build. > Ran automatic regression tests. This pull request has now been integrated. Changeset: 19507470 Author: Emanuel Peter <epeter at openjdk.org> URL: https://git.openjdk.org/jdk/commit/19507470c458182be04cdd75b5b819013c5e0115 Stats: 39 lines in 5 files changed: 0 ins; 34 del; 5 mod 8279913: obsolete ExtendedDTraceProbes Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/10930 From stefank at openjdk.org Thu Nov 3 11:16:31 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Nov 2022 11:16:31 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <UzAin3odGxyaz9-ebSXSRIQJLhB3Zk0MqHGKX5FjktM=.e2b10b4a-63db-4c74-9723-7974234bd9f2@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> <UzAin3odGxyaz9-ebSXSRIQJLhB3Zk0MqHGKX5FjktM=.e2b10b4a-63db-4c74-9723-7974234bd9f2@github.com> Message-ID: <GpaUelnhcPKyjFg1aVE95M66RksgAklLtCaCADp1cDM=.0413b46c-8beb-4c44-82ef-7583d3da2c31@github.com> On Wed, 2 Nov 2022 14:56:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. > > I always wanted something like a thread-bound default Memflags, that could be set with something like MemFlagsMark. It would define the Memflags to be used for the extent if no explicit Memflags are given. This would make sense, especially for utility classes that are used on behalf of other code. For example, wherever we dive into Metaspace allocation, we set the default flag to mtMetaspace, which is probably a good default for all subsequent mallocs in this extent. > > What do you think? I think it is unclear if this would be a net benefit for the code base. IMHO, it's easier to mess up by forgetting to add a MemFlagsMark, than if the compiler straight up told you that you need to provide a MEMFLAGS. It's also harder to look at a CHeap allocating call-site and figure out where the MemFlagsMark is located. I'm interested in hearing what others think. ------------- PR: https://git.openjdk.org/jdk/pull/10948 From aph at openjdk.org Thu Nov 3 11:17:49 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Nov 2022 11:17:49 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references In-Reply-To: <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> Message-ID: <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> On Tue, 1 Nov 2022 17:43:55 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. It might be a bit more than that: it's a test-and-branch on every memory access. Maybe enable it only on a non-optimized build? ------------- PR: https://git.openjdk.org/jdk/pull/10920 From coleenp at openjdk.org Thu Nov 3 11:50:56 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 11:50:56 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <YRt5YwleIf5ktBSxk6Rhv-cJOReHEaJjsOJ_ZEM7SY0=.12563577-76fa-473b-a7bf-1b94e4d38b5a@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> <YRt5YwleIf5ktBSxk6Rhv-cJOReHEaJjsOJ_ZEM7SY0=.12563577-76fa-473b-a7bf-1b94e4d38b5a@github.com> Message-ID: <jddC9jNIpq5793XrGCrXp5X0kNDaDRPISE9a2Nl3khY=.9f8df9ec-f27c-4f51-9de7-5e9451e420cd@github.com> On Thu, 3 Nov 2022 05:28:23 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove now-unused function that I missed. > > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 116: > >> 114: >> 115: JvmtiTagMapEntry* JvmtiTagMapTable::find(oop obj) { >> 116: if (obj->has_no_hash()) { > > This new function you added checks if the markWord has a hashCode. If there is a displaced markWord, then it very well might be that there is a hashCode, but it is in the displaced markWord - either in a stack lock or an ObjectMonitor. Bailing here does not seem correct, as it might actually be in the table even if there is no hashCode in the markWord. Is this an optimization? It is an optimization. I don't think we want to create an identity hash for all oops just for lookup. Is there a better way to find if an oop hashCode? ------------- PR: https://git.openjdk.org/jdk/pull/10938 From fyang at openjdk.org Thu Nov 3 12:25:39 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Nov 2022 12:25:39 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> Message-ID: <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> On Wed, 2 Nov 2022 13:19:56 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> src/hotspot/cpu/riscv/riscv.ad line 5196: >> >>> 5194: >>> 5195: ins_encode %{ >>> 5196: __ addi(t0, as_Register($mem$$base), $mem$$disp); >> >> This might be further improved as I see prefetch instructions can receive some immediate offset. > > The offset needs to be aligned on 32 bytes (the lower 5 bits must be zero). There is then no guarantee that `$mem$$base + ($mem$$disp & ~((1<<5)-1)` is still on the same cache line. It's then easier to do a prefetch of `base+disp` with `offset = 0`. But what if we are passed some $mem$$disp which is multiple of 32 and thus satisfies the constraint? Then this "addi" instruction could be optimized out, right? Because we could encoding $mem$$disp in the offset field. >> src/hotspot/os_cpu/linux_riscv/prefetch_linux_riscv.inline.hpp line 36: >> >>> 34: (void (*)(const void*, intptr_t))StubRoutines::riscv::prefetch_r(); >>> 35: if (interval >= 0 && stub != NULL) { >>> 36: stub(loc, interval); >> >> I am not sure if it really worth it to call a stub for read / write here. It looks to me not a big issue for the case the stub tries to catch and resolve. And I see aarch64 simply plant a 'prfm' instruction for prefetching [1]. I guess we might can do the same? >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/prefetch_linux_aarch64.inline.hpp#L34 > > We would need to check for `UseZicbop` in any case; the access to a global variable is then required. > > It would be the same issue as https://github.com/openjdk/jdk/pull/10884/files/e968f7164124dcf560807c9ff7765e6f82b64cdd#diff-e3c18b8b83898e82b5a3069319df6a47468e91cc2527bf065e704a685a20f26bR5196 without the stub. > > I've to admit that the `interval` naming here is confusing since no implementation ever uses it as an interval but alway as an offset. Also, the callers assume it to be an offset, like `ContiguousSpace::prepare_for_compaction` for example. Yes, I agree that a check for UseZicbop option would be necessary. But I still don't understand why we should implement this through a stub here. It looks to me that CPP code with inline assembly would also do. At least this could help eliminate the prologue & epilogue cost of calling the stub. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Thu Nov 3 12:31:26 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 3 Nov 2022 12:31:26 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> Message-ID: <Euw0PobjClWV1VKRqb2OqP4lsm3rhyOzyYu91Yu7bYs=.830de65c-558e-464d-b3ee-dca2d088b274@github.com> On Thu, 3 Nov 2022 12:20:06 GMT, Fei Yang <fyang at openjdk.org> wrote: >> We would need to check for `UseZicbop` in any case; the access to a global variable is then required. >> >> It would be the same issue as https://github.com/openjdk/jdk/pull/10884/files/e968f7164124dcf560807c9ff7765e6f82b64cdd#diff-e3c18b8b83898e82b5a3069319df6a47468e91cc2527bf065e704a685a20f26bR5196 without the stub. >> >> I've to admit that the `interval` naming here is confusing since no implementation ever uses it as an interval but alway as an offset. Also, the callers assume it to be an offset, like `ContiguousSpace::prepare_for_compaction` for example. > > Yes, I agree that a check for UseZicbop option would be necessary. But I still don't understand why we should implement this through a stub here. It looks to me that CPP code with inline assembly would also do. At least this could help eliminate the prologue & epilogue cost of calling the stub. We can definitely go with inline assembly here, just not sure that `prefetch.r` or `prefetch.w` would be recognized by the assembler. However, given that `prefetch.*` is an `ori` under the hood, we can simply use that. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From coleenp at openjdk.org Thu Nov 3 12:46:00 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 12:46:00 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v4] In-Reply-To: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Message-ID: <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> > Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. > Tested with tier1-6. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix has_no_hash into fast_no_hash_check(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10938/files - new: https://git.openjdk.org/jdk/pull/10938/files/f214791d..8770d38c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=02-03 Stats: 8 lines in 4 files changed: 3 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10938.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10938/head:pull/10938 PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 12:46:03 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 12:46:03 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <jddC9jNIpq5793XrGCrXp5X0kNDaDRPISE9a2Nl3khY=.9f8df9ec-f27c-4f51-9de7-5e9451e420cd@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> <YRt5YwleIf5ktBSxk6Rhv-cJOReHEaJjsOJ_ZEM7SY0=.12563577-76fa-473b-a7bf-1b94e4d38b5a@github.com> <jddC9jNIpq5793XrGCrXp5X0kNDaDRPISE9a2Nl3khY=.9f8df9ec-f27c-4f51-9de7-5e9451e420cd@github.com> Message-ID: <P1ph_cwNKoSK17UrA5XWkmy77beRv2Eudg70lYQbLqw=.3ec087a4-bc9f-4734-b2c9-3158dd363f4e@github.com> On Thu, 3 Nov 2022 11:39:51 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> src/hotspot/share/prims/jvmtiTagMapTable.cpp line 116: >> >>> 114: >>> 115: JvmtiTagMapEntry* JvmtiTagMapTable::find(oop obj) { >>> 116: if (obj->has_no_hash()) { >> >> This new function you added checks if the markWord has a hashCode. If there is a displaced markWord, then it very well might be that there is a hashCode, but it is in the displaced markWord - either in a stack lock or an ObjectMonitor. Bailing here does not seem correct, as it might actually be in the table even if there is no hashCode in the markWord. Is this an optimization? > > It is an optimization. I don't think we want to create an identity hash for all oops just for lookup. Is there a better way to find if an oop hashCode? I was hoping there was just a bit but you're right. I renamed it to fast_no_hash_check() and only return true if the object is unlocked and added a comment. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 12:46:03 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 12:46:03 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v3] In-Reply-To: <P1ph_cwNKoSK17UrA5XWkmy77beRv2Eudg70lYQbLqw=.3ec087a4-bc9f-4734-b2c9-3158dd363f4e@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <Jqk_jqj5r60bf_vubYOVphT9LN35_LjcKirN1tC6u6c=.7f1bcbb7-3c1d-4fdf-9f75-517cf8389fb8@github.com> <YRt5YwleIf5ktBSxk6Rhv-cJOReHEaJjsOJ_ZEM7SY0=.12563577-76fa-473b-a7bf-1b94e4d38b5a@github.com> <jddC9jNIpq5793XrGCrXp5X0kNDaDRPISE9a2Nl3khY=.9f8df9ec-f27c-4f51-9de7-5e9451e420cd@github.com> <P1ph_cwNKoSK17UrA5XWkmy77beRv2Eudg70lYQbLqw=.3ec087a4-bc9f-4734-b2c9-3158dd363f4e@github.com> Message-ID: <3nvtr_Oq3qlti6PwenIt2Vykd4bugYyBAufKPgWS0JI=.cfda77dd-18eb-4690-a11f-f73c0a62448f@github.com> On Thu, 3 Nov 2022 12:39:28 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> It is an optimization. I don't think we want to create an identity hash for all oops just for lookup. Is there a better way to find if an oop hashCode? > > I was hoping there was just a bit but you're right. I renamed it to fast_no_hash_check() and only return true if the object is unlocked and added a comment. I'm rerunning jvmti and jdi tests locally. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From eosterlund at openjdk.org Thu Nov 3 13:31:30 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Nov 2022 13:31:30 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v4] In-Reply-To: <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> Message-ID: <nwPdd7tXkq39xk9IJI7vW-TZcJ7AGftOADYKXgeLSiA=.89c3ba06-2a4c-4eaf-968d-9150f15587c3@github.com> On Thu, 3 Nov 2022 12:46:00 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_no_hash into fast_no_hash_check(). Looks good now to me! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 13:53:31 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 13:53:31 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v4] In-Reply-To: <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> Message-ID: <yZ2f4_BJgnOuqLfGtPO0OMkA9gEZ4tPjcPFncyffyKo=.fc52834b-721b-4812-bf38-f9edb5aaa410@github.com> On Thu, 3 Nov 2022 12:46:00 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_no_hash into fast_no_hash_check(). Thanks Erik and Kim for reviewing! ------------- PR: https://git.openjdk.org/jdk/pull/10938 From matsaave at openjdk.org Thu Nov 3 14:50:49 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 3 Nov 2022 14:50:49 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v9] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <m7cjzZhyItkm0kZxy1xOd5je44p0YVZ2ecV8fc4O5cc=.5e9dd8af-86bd-47de-809b-23dd9f8cc126@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/05495a94..7cfe1f5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From luhenry at openjdk.org Thu Nov 3 15:22:17 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 3 Nov 2022 15:22:17 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> Message-ID: <8B_7dzG0LXHqRpVHai6eCLPzGifNg8uradUiTXe4WXQ=.68426470-bfc4-43a0-8b67-51f349a39b2e@github.com> On Thu, 3 Nov 2022 12:16:41 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The offset needs to be aligned on 32 bytes (the lower 5 bits must be zero). There is then no guarantee that `$mem$$base + ($mem$$disp & ~((1<<5)-1)` is still on the same cache line. It's then easier to do a prefetch of `base+disp` with `offset = 0`. > > But what if we are passed some $mem$$disp which is multiple of 32 and thus satisfies the constraint? Then this "addi" instruction could be optimized out, right? Because we could encoding $mem$$disp in the offset field. We want to guarantee that `$mem$$base + $mem$$disp` is on the same cache line as `$mem$$base + ($mem$$disp & ~0x1f)` or even `$mem$$base + (($mem$$disp & ~0x1f) + 0x1f)`. It's easy to find cases that trip these two possible solutions: 1. for `$mem$$base + ($mem$$disp & ~0x1f)`, if `$mem$$base = 0x30` and `$mem$$disp = 0x10`, then `$mem$$base + $mem$$disp = 0x40`, while `$mem$$base + ($mem$$disp & 0x1f) = 0x30` which are not on the same 64 bytes cache line. 2. for `$mem$$base + ($mem$$disp & ~0x1f) + 0x20`, if `$mem$$base = 0x30` and `$mem$$disp = 0x0`, then `$mem$$base + ($mem$$disp & ~0x1f) + 0x20 = 0x50` which again are not on the same 64 bytes cache line. The simplest and most accurate solution is the `__ addi` just before. Given that it's an immediate add and the disp values are at a maximum of `64 * CacheLineSize`, it's always going to fit in `addi` immediate. I can add a compile time check for the value of `$mem$$disp` and if a multiple of `32` then we can use the offset, otherwise we use `addi`. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From coleenp at openjdk.org Thu Nov 3 15:59:33 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 15:59:33 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <DPlhqOu1G3ecHOllQf7SqO1gKvcxpYNtoiNHQNc0B2U=.ecc28c7c-5a9b-4230-9424-df9df1e2b664@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. This looks reasonable. I vote no on a MemFlagsMark - we have too many Mark things that are hard to explain where to put them. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10948 From duke at openjdk.org Thu Nov 3 16:24:04 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 3 Nov 2022 16:24:04 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions Message-ID: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. ------------- Commit messages: - Remove whitespace in filemap.cpp - Clean up unused code - Fix TestTracePageSizes to account for page size mismatch when heap - Differentiate between GC policies that can move archive regions around - Support for mapping archived heap with serial gc - Support for mapping archived heap with parallel gc - Support for mapping archived heap with epsilon gc - Add APIs for mapping archive heap regions and update G1GC to use new Changes: https://git.openjdk.org/jdk/pull/10970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10970&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296263 Stats: 1814 lines in 39 files changed: 622 ins; 894 del; 298 mod Patch: https://git.openjdk.org/jdk/pull/10970.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10970/head:pull/10970 PR: https://git.openjdk.org/jdk/pull/10970 From duke at openjdk.org Thu Nov 3 16:24:05 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 3 Nov 2022 16:24:05 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <9t4dPp0MBP2YP7ifwvbe3178iUFSUMnP9TwxaoYAbNI=.7b0c7a6d-3e8d-4017-a479-47199169dd01@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. I understand reviewing all the changes together can be challenging. So I have tried to segregate the changes into different commits. I guess reviewing individual commits in the commit order should make it a bit easier. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From aph at openjdk.org Thu Nov 3 16:31:29 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Nov 2022 16:31:29 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references In-Reply-To: <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> Message-ID: <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> On Tue, 1 Nov 2022 23:51:08 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This patch fixes the remaining null pointer dereference bugs that I know of. >> >> For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. >> >> I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. >> >> I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. >> >> I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? > > src/hotspot/share/oops/instanceKlass.cpp line 390: > >> 388: // Record dependency to keep nest host from being unloaded before this class. >> 389: ClassLoaderData* this_key = class_loader_data(); >> 390: if (this_key != NULL) { > > The code assumes `this_key != NULL`. Do we need an assert/guarantee here? I did see this one trigger, otherwise I wouldn't have known about it, but I can't reproduce it today. Whether it's an assert or a guarantee depends on how serious the problem would be. > src/hotspot/share/opto/node.hpp line 1528: > >> 1526: public: >> 1527: Node_Array(Arena* a, uint max = OptoNodeListSize) : _a(a), _max(max) { >> 1528: if (a != NULL) { > > Add `assert(a != NULL, "...")` here? This change is to allow the creation of a null sentinel. See `Node_List::_empty_list((Arena*)NULL)` ------------- PR: https://git.openjdk.org/jdk/pull/10920 From luhenry at openjdk.org Thu Nov 3 16:33:41 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 3 Nov 2022 16:33:41 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v3] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <HzRPXbsCH2qbB756Af3761Mnl0M3QJAIK0aroPyD9QQ=.59368208-e1e7-48f4-8685-b87a5d2d6383@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/e968f716..0c8691b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=01-02 Stats: 14 lines in 2 files changed: 4 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Thu Nov 3 16:33:41 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 3 Nov 2022 16:33:41 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2] In-Reply-To: <8B_7dzG0LXHqRpVHai6eCLPzGifNg8uradUiTXe4WXQ=.68426470-bfc4-43a0-8b67-51f349a39b2e@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <KJRBGLyEsrMHOwR4lQVvm-jr2GWHu_glKKNvb7PF7q0=.9d71a4d8-21d8-450b-8c4b-aed36bdb3cbb@github.com> <YGN0d-bKsd_caM3cSnqHFI-rWtczvFJ3nwAynvoxdmw=.3a7ec3d1-6e89-45d8-b148-d2906fd877b5@github.com> <5X8pxLpqNNSSUp0ESbFBUJPdqGjtLPhFeGp29uB2jDU=.b4f16caa-540d-41f9-96e3-0208569f8ad0@github.com> <MGQiYzqL8kdTvmaArf2gB47oICqxMv4VcC-njZr2e8E=.c28da921-869f-4db6-ac26-ba5e47ce05c9@github.com> <8B_7dzG0LXHqRpVHai6eCLPzGifNg8uradUiTXe4WXQ=.68426470-bfc4-43a0-8b67-51f349a39b2e@github.com> Message-ID: <JOo1T6YDjcCKpEVljR8Ciwrh-IFmygWhsvKxcdQ6v3c=.1a983f14-365f-4275-86a0-4803d760dc35@github.com> On Thu, 3 Nov 2022 15:18:44 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> But what if we are passed some $mem$$disp which is multiple of 32 and thus satisfies the constraint? Then this "addi" instruction could be optimized out, right? Because we could encoding $mem$$disp in the offset field. > > We want to guarantee that `$mem$$base + $mem$$disp` is on the same cache line as `$mem$$base + ($mem$$disp & ~0x1f)` or even `$mem$$base + (($mem$$disp & ~0x1f) + 0x1f)`. It's easy to find cases that trip these two possible solutions: > 1. for `$mem$$base + ($mem$$disp & ~0x1f)`, if `$mem$$base = 0x30` and `$mem$$disp = 0x10`, then `$mem$$base + $mem$$disp = 0x40`, while `$mem$$base + ($mem$$disp & 0x1f) = 0x30` which are not on the same 64 bytes cache line. > 2. for `$mem$$base + ($mem$$disp & ~0x1f) + 0x20`, if `$mem$$base = 0x30` and `$mem$$disp = 0x0`, then `$mem$$base + ($mem$$disp & ~0x1f) + 0x20 = 0x50` which again are not on the same 64 bytes cache line. > > The simplest and most accurate solution is the `__ addi` just before. Given that it's an immediate add and the disp values are at a maximum of `64 * CacheLineSize`, it's always going to fit in `addi` immediate. > > I can add a compile time check for the value of `$mem$$disp` and if a multiple of `32` then we can use the offset, otherwise we use `addi`. I've added a check for the value of `$mem$$disp`. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Thu Nov 3 16:36:49 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 3 Nov 2022 16:36:49 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v4] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - fixup! remove dead code - remove dead code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/0c8691b9..48ab8f31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=02-03 Stats: 57 lines in 4 files changed: 0 ins; 57 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From kbarrett at openjdk.org Thu Nov 3 17:20:37 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 3 Nov 2022 17:20:37 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v4] In-Reply-To: <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> Message-ID: <Fs9UjdFyuSfxFrJFd1n98ESUWD8vcvvoIGbIBSJSckk=.fdd54bd3-e17f-4928-9b0f-1df70a7a94f1@github.com> On Thu, 3 Nov 2022 12:46:00 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_no_hash into fast_no_hash_check(). Good thing Erik caught the displaced markword issue. Looks even better now. src/hotspot/share/oops/oop.hpp line 294: > 292: > 293: // identity hash; returns the identity hash key (computes it if necessary) > 294: inline bool fast_no_hash_check(); Seems like `fast_no_hash_check` ought to be later in this grouping. The preceding comment is about `identity_hash`. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 17:24:59 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 17:24:59 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v5] In-Reply-To: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Message-ID: <kPc5tP7WLK1aibrgqiWu2Sm6VSKtDfPUFg4UND_OjLQ=.06ab492d-a612-4967-9e5a-eb98c25e169c@github.com> > Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. > Tested with tier1-6. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix has_no_hash into fast_no_hash_check(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10938/files - new: https://git.openjdk.org/jdk/pull/10938/files/8770d38c..7bda861f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10938&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10938.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10938/head:pull/10938 PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 17:25:02 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 17:25:02 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v4] In-Reply-To: <Fs9UjdFyuSfxFrJFd1n98ESUWD8vcvvoIGbIBSJSckk=.fdd54bd3-e17f-4928-9b0f-1df70a7a94f1@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <B-1BbYX4v8UXrPWEFzp1pU9grDEVWdK1iG1DoK-rq5E=.44d1dd21-7bd8-4bf7-a034-17207044a33a@github.com> <Fs9UjdFyuSfxFrJFd1n98ESUWD8vcvvoIGbIBSJSckk=.fdd54bd3-e17f-4928-9b0f-1df70a7a94f1@github.com> Message-ID: <IOvzggKnx6IS_fA3ACGQdsLMaZ3OherEriwqnKk0yoo=.33d9fcf5-a514-4fdf-a26e-99777a471da1@github.com> On Thu, 3 Nov 2022 17:14:39 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix has_no_hash into fast_no_hash_check(). > > src/hotspot/share/oops/oop.hpp line 294: > >> 292: >> 293: // identity hash; returns the identity hash key (computes it if necessary) >> 294: inline bool fast_no_hash_check(); > > Seems like `fast_no_hash_check` ought to be later in this grouping. The preceding comment is about `identity_hash`. Ok, this makes sense. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 17:28:12 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 17:28:12 GMT Subject: RFR: 8256072: Eliminate JVMTI tagmap rehashing [v5] In-Reply-To: <kPc5tP7WLK1aibrgqiWu2Sm6VSKtDfPUFg4UND_OjLQ=.06ab492d-a612-4967-9e5a-eb98c25e169c@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> <kPc5tP7WLK1aibrgqiWu2Sm6VSKtDfPUFg4UND_OjLQ=.06ab492d-a612-4967-9e5a-eb98c25e169c@github.com> Message-ID: <vtqHuxbCiIv1wM2zwATFUXV8jFz2pJsdO8_i7SF2Nrs=.0c04e56c-161b-44df-b199-aed89dfd65a2@github.com> On Thu, 3 Nov 2022 17:24:59 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_no_hash into fast_no_hash_check(). Thanks for the re-review Kim. Recompiled with the trivial change. ------------- PR: https://git.openjdk.org/jdk/pull/10938 From coleenp at openjdk.org Thu Nov 3 17:30:33 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Nov 2022 17:30:33 GMT Subject: Integrated: 8256072: Eliminate JVMTI tagmap rehashing In-Reply-To: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> References: <e2pY-7vuf5oQQnczqpNXRM9Xj0quTTgkQ0mXNQCR1ok=.a736022c-5e03-4621-8d10-6a3504b5d652@github.com> Message-ID: <1O2rw3n-DpaBg_p13z3wUDImRCGKpKN2tKaLfFoZP5E=.c35305bc-1a06-42a0-8c8d-b3392a22a2ce@github.com> On Tue, 1 Nov 2022 22:31:03 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Use identity_hash for objects in the JVMTI TagMap table. If the object has no hashcode, it's not in the table. > Tested with tier1-6. This pull request has now been integrated. Changeset: 94eb25a4 Author: Coleen Phillimore <coleenp at openjdk.org> URL: https://git.openjdk.org/jdk/commit/94eb25a4f1ffb0f8c834a03101d98fbff5dd0c5c Stats: 132 lines in 18 files changed: 13 ins; 113 del; 6 mod 8256072: Eliminate JVMTI tagmap rehashing Reviewed-by: kbarrett, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/10938 From stuefe at openjdk.org Thu Nov 3 18:17:30 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 3 Nov 2022 18:17:30 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <qd1FA1-x4NdxYlgG1TwcoTqFuzGpHDR3NMQ9S9Cvbs0=.c78d57a9-d00e-43a6-8d6f-633299392d27@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. > I think it is unclear if this would be a net benefit for the code base. IMHO, it's easier to mess up by forgetting to add a MemFlagsMark, than if the compiler straight up told you that you need to provide a MEMFLAGS. It's also harder to look at a CHeap allocating call-site and figure out where the MemFlagsMark is located. > This looks reasonable. I vote no on a MemFlagsMark - we have too many Mark things that are hard to explain where to put them. No problem. It's good we talked about it before I sunk work into this. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/10948 From aph at openjdk.org Thu Nov 3 18:34:18 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Nov 2022 18:34:18 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v2] In-Reply-To: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> Message-ID: <z1RaBPfly1w0hkCD2SjW2duVLyLgwjCcStLw-69RHCA=.904f4e1e-ae62-46e2-9727-f8af65840817@github.com> > This patch fixes the remaining null pointer dereference bugs that I know of. > > For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. > > I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. > > I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. > > I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Push ScopedValue tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10920/files - new: https://git.openjdk.org/jdk/pull/10920/files/5fe47361..d298edfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=00-01 Stats: 1521 lines in 9 files changed: 1521 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10920.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10920/head:pull/10920 PR: https://git.openjdk.org/jdk/pull/10920 From aph at openjdk.org Thu Nov 3 18:40:33 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Nov 2022 18:40:33 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> Message-ID: <n_G07QKabkGRoc17hqRJz8RZZfgik4VasUhnTBecWi0=.bee22124-927e-4a95-8286-97088de7592b@github.com> > This patch fixes the remaining null pointer dereference bugs that I know of. > > For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. > > I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. > > I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. > > I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Revert "Push ScopedValue tests" This reverts commit d298edfa9eda48ace9a27f83d38320fe6ba79e67. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10920/files - new: https://git.openjdk.org/jdk/pull/10920/files/d298edfa..82b99586 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=01-02 Stats: 1521 lines in 9 files changed: 0 ins; 1521 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10920.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10920/head:pull/10920 PR: https://git.openjdk.org/jdk/pull/10920 From iklam at openjdk.org Thu Nov 3 20:02:29 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 3 Nov 2022 20:02:29 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <mfMzz_7M2a2XG_d4SE0BNCP-o52KfQlWelKJwusTJwA=.3064535d-df7b-47fc-aa0f-b85e261f7a33@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. Hi Ashutosh, I've been trying out your patch and reading the code. So far, I found a few problems: - With debug VM, `java -Xlog:cds -XX:+UseSerialGC --version` spends a long time at VM exit to verify the heap - I cannot run the jtreg tests with a fastdebug or slowdebug VM in agentvm mode. I.e., `jtreg .... -agentvm HelloTest.java`. Product VM works fine in agentvm mode. - These two test cases crash with a product VM: - runtime/cds/appcds/javaldr/GCSharedStringsDuringDump.java - runtime/cds/appcds/sharedStrings/SharedStringsStress.java ------------- PR: https://git.openjdk.org/jdk/pull/10970 From dlong at openjdk.org Thu Nov 3 21:30:30 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Nov 2022 21:30:30 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> Message-ID: <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> On Thu, 3 Nov 2022 03:18:57 GMT, Fei Yang <fyang at openjdk.org> wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1053: >> >>> 1051: >>> 1052: intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); >>> 1053: assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 >> >> Are there any ports where (stack_frame_bottom - stack_frame_top) != fsize? It would be nice if we could use (stack_frame_bottom - stack_frame_top) for fsize and remove the platform-specific computation using frame::metadata_words above. > > I think aarch64 and riscv are different from x86_64 here due to possible padding in the frame[1][2]. > So if we modify this assertion like: > > diff --git a/src/hotspot/share/runtime/continuationFreezeThaw.cpp b/src/hotspot/share/runtime/continuationFreezeThaw.cpp > index 2ef48618ccb..c8e88b67f94 100644 > --- a/src/hotspot/share/runtime/continuationFreezeThaw.cpp > +++ b/src/hotspot/share/runtime/continuationFreezeThaw.cpp > @@ -1050,7 +1050,7 @@ NOINLINE freeze_result FreezeBase::recurse_freeze_interpreted_frame(frame& f, fr > const int fsize = f.fp() + frame::metadata_words + locals - stack_frame_top; > > intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); > - assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 > + assert(stack_frame_bottom - stack_frame_top == fsize, ""); // == on x86 > > DEBUG_ONLY(verify_frame_top(f, stack_frame_top)); > > > Then we will trigger assertion failure on linux-aarch64 running a simple virtual thread demo: > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/realfyang/openjdk-jdk/src/hotspot/share/runtime/continuationFreezeThaw.cpp > :1053), pid=2680946, tid=2680964 > # Error: assert(stack_frame_bottom - stack_frame_top == fsize) failed > # > # JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.realfyang.open > jdk-jdk) > # Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, > sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x865c98] FreezeBase::recurse_freeze_interpreted_frame(frame&, frame&, int, bool)+ > 0xb8 > # > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/frame_aarch64.hpp#L62 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/frame_riscv.hpp#L62 OK, skipping the assert for now. Regarding `NOT_RISCV64(+ frame::metadata_words)` @reinrich just posted the PR for the PPC64 port (see JDK-8286302), which introduces metadata_words_at_top/metadata_words_at_bottom. I'm not sure what values RISC-V would use for these new constants, but merging with the PPC64 changes might allow a platform-independent setting of fsize here. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From duke at openjdk.org Thu Nov 3 21:40:30 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 3 Nov 2022 21:40:30 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <mfMzz_7M2a2XG_d4SE0BNCP-o52KfQlWelKJwusTJwA=.3064535d-df7b-47fc-aa0f-b85e261f7a33@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <mfMzz_7M2a2XG_d4SE0BNCP-o52KfQlWelKJwusTJwA=.3064535d-df7b-47fc-aa0f-b85e261f7a33@github.com> Message-ID: <0_IjxgOT11iofjMvvZdO4o6mJXBEHJmDj3T5IwD8Yfw=.008a1291-2c56-4dc3-aaa4-08448dfb8ba1@github.com> On Thu, 3 Nov 2022 19:58:41 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. >> >> In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. >> When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. >> When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. >> >> This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. >> This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. > > Hi Ashutosh, I've been trying out your patch and reading the code. So far, I found a few problems: > > - With debug VM, `java -Xlog:cds -XX:+UseSerialGC --version` spends a long time at VM exit to verify the heap > - I cannot run the jtreg tests with a fastdebug or slowdebug VM in agentvm mode. I.e., `jtreg .... -agentvm HelloTest.java`. Product VM works fine in agentvm mode. > - These two test cases crash with a product VM: > - runtime/cds/appcds/javaldr/GCSharedStringsDuringDump.java > - runtime/cds/appcds/sharedStrings/SharedStringsStress.java @iklam thanks for taking a look at the patches and trying them out. > With debug VM, java -Xlog:cds -XX:+UseSerialGC --version spends a long time at VM exit to verify the heap ah right! I too observed this locally but somehow missed my radar. I need to check what's going on there. > These two test cases crash with a product VM: > - runtime/cds/appcds/javaldr/GCSharedStringsDuringDump.java > - runtime/cds/appcds/sharedStrings/SharedStringsStress.java I can recreate these failures locally with fastdebug build as well. Looking at them right now. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From matsaave at openjdk.org Thu Nov 3 22:22:30 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 3 Nov 2022 22:22:30 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v2] In-Reply-To: <pEWSILJVbquVykKPd2WwaP-BYy_OJUTsD0rG1vzEcKk=.83be8135-6acd-44ce-a49f-21e79649a2db@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> <pEWSILJVbquVykKPd2WwaP-BYy_OJUTsD0rG1vzEcKk=.83be8135-6acd-44ce-a49f-21e79649a2db@github.com> Message-ID: <Jg01Z-HGAyztVsd_9YTewB66OS1B5YkiU3h_8hCD4Ak=.7309bf6f-e747-4d48-b51c-cdca2c49c485@github.com> On Thu, 27 Oct 2022 05:55:50 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Added null check and resource mark > > For testing the handling of uninitialized ConstantPoolEntry's, you can do something like this: > > - set a breakpoint at InstanceKlass::initialize_impl in gdb > - when the breakpoint is hit, check if constants()->cache() is NULL > - if not NULL, do this: `call constants()->cache()->print_on(tty)` > > This should catch the problem in your earlier commit where you didn't check for the NULL Method pointer. Thank you for the corrections and feedback @iklam @coleenp @dholmes-ora! ------------- PR: https://git.openjdk.org/jdk/pull/10860 From vlivanov at openjdk.org Thu Nov 3 22:30:57 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 3 Nov 2022 22:30:57 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> Message-ID: <GXcLOnn_At_5Cej39psYTY58lWVL8HeQHhr8hUN7a5E=.0125c2af-6ea8-4b4b-bf04-ae5981b81c02@github.com> On Thu, 3 Nov 2022 16:27:07 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/opto/node.hpp line 1528: >> >>> 1526: public: >>> 1527: Node_Array(Arena* a, uint max = OptoNodeListSize) : _a(a), _max(max) { >>> 1528: if (a != NULL) { >> >> Add `assert(a != NULL, "...")` here? > > This change is to allow the creation of a null sentinel. See `Node_List::_empty_list((Arena*)NULL)` Oh, I see it now... Considering `Node_List::_empty_list` is effectively unusable (except for `is_null()` query), I'd prefer to see `postaloc.cpp` migrated away from references to pointers when it comes to `Node_List`. It already does ugly things like [1] which your patch doesn't handle yet. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/postaloc.cpp#L260 How about the following patch for `postaloc.cpp`? Does it solve your problem? diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp index 96c30a122bb..10c9d1f90ae 100644 --- a/src/hotspot/share/opto/postaloc.cpp +++ b/src/hotspot/share/opto/postaloc.cpp @@ -77,7 +77,7 @@ bool PhaseChaitin::may_be_copy_of_callee( Node *def ) const { //------------------------------yank----------------------------------- // Helper function for yank_if_dead -int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_List *regnd ) { +int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_List *regnd) { int blk_adjust=0; Block *oldb = _cfg.get_block_for_node(old); oldb->find_remove(old); @@ -87,9 +87,9 @@ int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_ } _cfg.unmap_node_from_block(old); OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); - if( regnd && (*regnd)[old_reg]==old ) { // Instruction is currently available? - value->map(old_reg,NULL); // Yank from value/regnd maps - regnd->map(old_reg,NULL); // This register's value is now unknown + if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? + value->map(old_reg, NULL); // Yank from value/regnd maps + regnd->map(old_reg, NULL); // This register's value is now unknown } return blk_adjust; } @@ -161,7 +161,7 @@ int PhaseChaitin::yank_if_dead_recurse(Node *old, Node *orig_old, Block *current // Use the prior value instead of the current value, in an effort to make // the current value go dead. Return block iterator adjustment, in case // we yank some instructions from this block. -int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List &value, Node_List ®nd ) { +int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List *value, Node_List *regnd ) { // No effect? if( def == n->in(idx) ) return 0; // Def is currently dead and can be removed? Do not resurrect @@ -207,7 +207,7 @@ int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *curre _post_alloc++; // Is old def now dead? We successfully yanked a copy? - return yank_if_dead(old,current_block,&value,®nd); + return yank_if_dead(old,current_block,value,regnd); } @@ -229,7 +229,7 @@ Node *PhaseChaitin::skip_copies( Node *c ) { //------------------------------elide_copy------------------------------------- // Remove (bypass) copies along Node n, edge k. -int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &value, Node_List ®nd, bool can_change_regs ) { +int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *value, Node_List *regnd, bool can_change_regs ) { int blk_adjust = 0; uint nk_idx = _lrg_map.live_range_id(n->in(k)); @@ -253,12 +253,13 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v // Phis and 2-address instructions cannot change registers so easily - their // outputs must match their input. - if( !can_change_regs ) + if (!can_change_regs) { return blk_adjust; // Only check stupid copies! - + } // Loop backedges won't have a value-mapping yet - if( &value == NULL ) return blk_adjust; - + if (value == NULL) { + return blk_adjust; + } // Skip through all copies to the _value_ being used. Do not change from // int to pointer. This attempts to jump through a chain of copies, where // intermediate copies might be illegal, i.e., value is stored down to stack @@ -273,10 +274,11 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v // See if it happens to already be in the correct register! // (either Phi's direct register, or the common case of the name // never-clobbered original-def register) - if (register_contains_value(val, val_reg, n_regs, value)) { - blk_adjust += use_prior_register(n,k,regnd[val_reg],current_block,value,regnd); - if( n->in(k) == regnd[val_reg] ) // Success! Quit trying - return blk_adjust; + if (register_contains_value(val, val_reg, n_regs, *value)) { + blk_adjust += use_prior_register(n,k,regnd->at(val_reg),current_block,value,regnd); + if (n->in(k) == regnd->at(val_reg)) { + return blk_adjust; // Success! Quit trying + } } // See if we can skip the copy by changing registers. Don't change from @@ -304,7 +306,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v if (ignore_self) continue; } - Node *vv = value[reg]; + Node *vv = value->at(reg); // For scalable register, number of registers may be inconsistent between // "val_reg" and "reg". For example, when "val" resides in register // but "reg" is located in stack. @@ -325,7 +327,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v last = (n_regs-1); // Looking for the last part of a set } if ((reg&last) != last) continue; // Wrong part of a set - if (!register_contains_value(vv, reg, n_regs, value)) continue; // Different value + if (!register_contains_value(vv, reg, n_regs, *value)) continue; // Different value } if( vv == val || // Got a direct hit? (t && vv && vv->bottom_type() == t && vv->is_Mach() && @@ -333,9 +335,9 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v assert( !n->is_Phi(), "cannot change registers at a Phi so easily" ); if( OptoReg::is_stack(nk_reg) || // CISC-loading from stack OR OptoReg::is_reg(reg) || // turning into a register use OR - regnd[reg]->outcnt()==1 ) { // last use of a spill-load turns into a CISC use - blk_adjust += use_prior_register(n,k,regnd[reg],current_block,value,regnd); - if( n->in(k) == regnd[reg] ) // Success! Quit trying + regnd->at(reg)->outcnt()==1 ) { // last use of a spill-load turns into a CISC use + blk_adjust += use_prior_register(n,k,regnd->at(reg),current_block,value,regnd); + if( n->in(k) == regnd->at(reg) ) // Success! Quit trying return blk_adjust; } // End of if not degrading to a stack } // End of if found value in another register @@ -535,7 +537,7 @@ void PhaseChaitin::post_allocate_copy_removal() { Block* pb = _cfg.get_block_for_node(block->pred(j)); // Remove copies along phi edges for (uint k = 1; k < phi_dex; k++) { - elide_copy(block->get_node(k), j, block, *blk2value[pb->_pre_order], *blk2regnd[pb->_pre_order], false); + elide_copy(block->get_node(k), j, block, blk2value[pb->_pre_order], blk2regnd[pb->_pre_order], false); } if (blk2value[pb->_pre_order]) { // Have a mapping on this edge? // See if this predecessor's mappings have been used by everybody @@ -691,7 +693,7 @@ void PhaseChaitin::post_allocate_copy_removal() { // Remove copies along input edges for (k = 1; k < n->req(); k++) { - j -= elide_copy(n, k, block, value, regnd, two_adr != k); + j -= elide_copy(n, k, block, &value, ®nd, two_adr != k); } // Unallocated Nodes define no registers ------------- PR: https://git.openjdk.org/jdk/pull/10920 From iklam at openjdk.org Thu Nov 3 23:22:26 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 3 Nov 2022 23:22:26 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <gvilFqItYZLXc4baWTlvEyb1w0enJmhag56irgRDOog=.978479f5-b7e1-437d-9016-1a7bfb7ed180@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. By the way, one enhancement that I've been considering (which will be independent of this PR) is to remove the dependency on G1 for writing the archived heap. I've created https://bugs.openjdk.org/browse/JDK-8296344 ------------- PR: https://git.openjdk.org/jdk/pull/10970 From vlivanov at openjdk.org Thu Nov 3 23:59:27 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 3 Nov 2022 23:59:27 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <GXcLOnn_At_5Cej39psYTY58lWVL8HeQHhr8hUN7a5E=.0125c2af-6ea8-4b4b-bf04-ae5981b81c02@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> <GXcLOnn_At_5Cej39psYTY58lWVL8HeQHhr8hUN7a5E=.0125c2af-6ea8-4b4b-bf04-ae5981b81c02@github.com> Message-ID: <WBzNXPj7q8bjEEbF7X8GnyMKZ0h6SV1mEnnoj9il3EA=.8183b8cb-c2d9-4e57-b5bc-64fb1d959d14@github.com> On Thu, 3 Nov 2022 22:25:38 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This change is to allow the creation of a null sentinel. See `Node_List::_empty_list((Arena*)NULL)` > > Oh, I see it now... > > Considering `Node_List::_empty_list` is effectively unusable (except for `is_null()` query), I'd prefer to see `postaloc.cpp` migrated away from references to pointers when it comes to `Node_List`. It already does ugly things like [1] which your patch doesn't handle yet. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/postaloc.cpp#L260 > > How about the following patch for `postaloc.cpp`? Does it solve your problem? > > diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp > index 96c30a122bb..10c9d1f90ae 100644 > --- a/src/hotspot/share/opto/postaloc.cpp > +++ b/src/hotspot/share/opto/postaloc.cpp > @@ -77,7 +77,7 @@ bool PhaseChaitin::may_be_copy_of_callee( Node *def ) const { > > //------------------------------yank----------------------------------- > // Helper function for yank_if_dead > -int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_List *regnd ) { > +int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_List *regnd) { > int blk_adjust=0; > Block *oldb = _cfg.get_block_for_node(old); > oldb->find_remove(old); > @@ -87,9 +87,9 @@ int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_ > } > _cfg.unmap_node_from_block(old); > OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); > - if( regnd && (*regnd)[old_reg]==old ) { // Instruction is currently available? > - value->map(old_reg,NULL); // Yank from value/regnd maps > - regnd->map(old_reg,NULL); // This register's value is now unknown > + if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? > + value->map(old_reg, NULL); // Yank from value/regnd maps > + regnd->map(old_reg, NULL); // This register's value is now unknown > } > return blk_adjust; > } > @@ -161,7 +161,7 @@ int PhaseChaitin::yank_if_dead_recurse(Node *old, Node *orig_old, Block *current > // Use the prior value instead of the current value, in an effort to make > // the current value go dead. Return block iterator adjustment, in case > // we yank some instructions from this block. > -int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List &value, Node_List ®nd ) { > +int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List *value, Node_List *regnd ) { > // No effect? > if( def == n->in(idx) ) return 0; > // Def is currently dead and can be removed? Do not resurrect > @@ -207,7 +207,7 @@ int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *curre > _post_alloc++; > > // Is old def now dead? We successfully yanked a copy? > - return yank_if_dead(old,current_block,&value,®nd); > + return yank_if_dead(old,current_block,value,regnd); > } > > > @@ -229,7 +229,7 @@ Node *PhaseChaitin::skip_copies( Node *c ) { > > //------------------------------elide_copy------------------------------------- > // Remove (bypass) copies along Node n, edge k. > -int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &value, Node_List ®nd, bool can_change_regs ) { > +int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *value, Node_List *regnd, bool can_change_regs ) { > int blk_adjust = 0; > > uint nk_idx = _lrg_map.live_range_id(n->in(k)); > @@ -253,12 +253,13 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v > > // Phis and 2-address instructions cannot change registers so easily - their > // outputs must match their input. > - if( !can_change_regs ) > + if (!can_change_regs) { > return blk_adjust; // Only check stupid copies! > - > + } > // Loop backedges won't have a value-mapping yet > - if( &value == NULL ) return blk_adjust; > - > + if (value == NULL) { > + return blk_adjust; > + } > // Skip through all copies to the _value_ being used. Do not change from > // int to pointer. This attempts to jump through a chain of copies, where > // intermediate copies might be illegal, i.e., value is stored down to stack > @@ -273,10 +274,11 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v > // See if it happens to already be in the correct register! > // (either Phi's direct register, or the common case of the name > // never-clobbered original-def register) > - if (register_contains_value(val, val_reg, n_regs, value)) { > - blk_adjust += use_prior_register(n,k,regnd[val_reg],current_block,value,regnd); > - if( n->in(k) == regnd[val_reg] ) // Success! Quit trying > - return blk_adjust; > + if (register_contains_value(val, val_reg, n_regs, *value)) { > + blk_adjust += use_prior_register(n,k,regnd->at(val_reg),current_block,value,regnd); > + if (n->in(k) == regnd->at(val_reg)) { > + return blk_adjust; // Success! Quit trying > + } > } > > // See if we can skip the copy by changing registers. Don't change from > @@ -304,7 +306,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v > if (ignore_self) continue; > } > > - Node *vv = value[reg]; > + Node *vv = value->at(reg); > // For scalable register, number of registers may be inconsistent between > // "val_reg" and "reg". For example, when "val" resides in register > // but "reg" is located in stack. > @@ -325,7 +327,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v > last = (n_regs-1); // Looking for the last part of a set > } > if ((reg&last) != last) continue; // Wrong part of a set > - if (!register_contains_value(vv, reg, n_regs, value)) continue; // Different value > + if (!register_contains_value(vv, reg, n_regs, *value)) continue; // Different value > } > if( vv == val || // Got a direct hit? > (t && vv && vv->bottom_type() == t && vv->is_Mach() && > @@ -333,9 +335,9 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v > assert( !n->is_Phi(), "cannot change registers at a Phi so easily" ); > if( OptoReg::is_stack(nk_reg) || // CISC-loading from stack OR > OptoReg::is_reg(reg) || // turning into a register use OR > - regnd[reg]->outcnt()==1 ) { // last use of a spill-load turns into a CISC use > - blk_adjust += use_prior_register(n,k,regnd[reg],current_block,value,regnd); > - if( n->in(k) == regnd[reg] ) // Success! Quit trying > + regnd->at(reg)->outcnt()==1 ) { // last use of a spill-load turns into a CISC use > + blk_adjust += use_prior_register(n,k,regnd->at(reg),current_block,value,regnd); > + if( n->in(k) == regnd->at(reg) ) // Success! Quit trying > return blk_adjust; > } // End of if not degrading to a stack > } // End of if found value in another register > @@ -535,7 +537,7 @@ void PhaseChaitin::post_allocate_copy_removal() { > Block* pb = _cfg.get_block_for_node(block->pred(j)); > // Remove copies along phi edges > for (uint k = 1; k < phi_dex; k++) { > - elide_copy(block->get_node(k), j, block, *blk2value[pb->_pre_order], *blk2regnd[pb->_pre_order], false); > + elide_copy(block->get_node(k), j, block, blk2value[pb->_pre_order], blk2regnd[pb->_pre_order], false); > } > if (blk2value[pb->_pre_order]) { // Have a mapping on this edge? > // See if this predecessor's mappings have been used by everybody > @@ -691,7 +693,7 @@ void PhaseChaitin::post_allocate_copy_removal() { > > // Remove copies along input edges > for (k = 1; k < n->req(); k++) { > - j -= elide_copy(n, k, block, value, regnd, two_adr != k); > + j -= elide_copy(n, k, block, &value, ®nd, two_adr != k); > } > > // Unallocated Nodes define no registers Sorry, missed a couple of null checks. The following patch on top of the previous one passes hs-tier1/2: diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp index 10c9d1f90ae..b39a78eef48 100644 --- a/src/hotspot/share/opto/postaloc.cpp +++ b/src/hotspot/share/opto/postaloc.cpp @@ -87,7 +87,8 @@ int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_L } _cfg.unmap_node_from_block(old); OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); - if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? + assert(value != NULL || regnd == NULL, "sanity"); + if (value != NULL && regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? value->map(old_reg, NULL); // Yank from value/regnd maps regnd->map(old_reg, NULL); // This register's value is now unknown } @@ -257,7 +258,8 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *v return blk_adjust; // Only check stupid copies! } // Loop backedges won't have a value-mapping yet - if (value == NULL) { + assert(regnd != NULL || value == NULL, "sanity"); + if (value == NULL || regnd == NULL) { return blk_adjust; } // Skip through all copies to the _value_ being used. Do not change from ------------- PR: https://git.openjdk.org/jdk/pull/10920 From vlivanov at openjdk.org Fri Nov 4 00:04:30 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 4 Nov 2022 00:04:30 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> Message-ID: <HN_ppGL2xcGGzXWAs7-lw9HxLJ063u7gtfIYKB2Rg0I=.78843e37-397b-4e1f-867f-bbe974d4d7cf@github.com> On Thu, 3 Nov 2022 16:23:45 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 390: >> >>> 388: // Record dependency to keep nest host from being unloaded before this class. >>> 389: ClassLoaderData* this_key = class_loader_data(); >>> 390: if (this_key != NULL) { >> >> The code assumes `this_key != NULL`. Do we need an assert/guarantee here? > > I did see this one trigger, otherwise I wouldn't have known about it, but I can't reproduce it today. Whether it's an assert or a guarantee depends on how serious the problem would be. Interesting! I do hit the assert during JDK build: # Internal Error (.../src/hotspot/share/oops/instanceKlass.cpp:390), pid=956, tid=6147 # Error: assert(this_key != __null) failed V report_vm_error(char const*, int, char const*, char const*, ...)+0x88 V InstanceKlass::set_nest_host(InstanceKlass*)+0x254 V SystemDictionary::load_shared_lambda_proxy_class(InstanceKlass*, Handle, Handle, PackageEntry*, JavaThread*)+0x19c V SystemDictionaryShared::prepare_shared_lambda_proxy_class(InstanceKlass*, InstanceKlass*, JavaThread*)+0x13c V JVM_LookupLambdaProxyClassFromArchive+0x2cc C Java_java_lang_invoke_LambdaProxyClassArchive_findFromArchive+0x4c j java.lang.invoke.LambdaProxyClassArchive.findFromArchive(...) java.base at 20-internal ... Looks like a pre-existing bug to me. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From dholmes at openjdk.org Fri Nov 4 01:59:34 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Nov 2022 01:59:34 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <qoaPKIqIH13c4sippjx7LDHYZOWqJzx0T7G8rYoDVm8=.f718f5b9-d163-4514-844a-183c21e1e158@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. Looks reasonable to me too. Thanks. I also "vote" against the MemFlagsMark - it would be too easy to include unintended allocations (that currently rely on default) and too hard to recognise you are within the scope of such a mark. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10948 From yadongwang at openjdk.org Fri Nov 4 02:27:32 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 4 Nov 2022 02:27:32 GMT Subject: RFR: 8295967: RISC-V: Support negVI/negVL instructions for Vector API In-Reply-To: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> References: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> Message-ID: <vO5mEZf7a5P6lcAuQi7xv_6AzVGyLp4YZ8DJUnOs0y8=.5724aa6c-c5bd-4ddf-af46-0f0de377159b@github.com> On Thu, 27 Oct 2022 05:38:03 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: > Hi, > > This patch will add support of `NegVI`, `NegVL` for RISC-V and was implemented by referring to riscv-v-spec v1.0 [1]. > > Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test cases[2] [3] , the compilation log is as follows: > > > 100 B16: # out( B37 B17 ) <- in( B15 ) Freq: 77.0109 > 100 # castII of R9, #@castII > 100 addw R29, R9, zr #@convI2L_reg_reg > 104 slli R29, R29, (#2 & 0x3f) #@lShiftL_reg_imm > 108 add R12, R30, R29 # ptr, #@addP_reg_reg > 10c addi R12, R12, #16 # ptr, #@addP_reg_imm > 110 vle V1, [R12] #@loadV > 118 vrsub.vx V1, V1, V1 #@vnegI > 120 bgeu R9, R10, B37 #@cmpU_branch P=0.000001 C=-1.000000 > > > At the same time, the following assembly code will be generated: > > > 0x000000400ccfa618: .4byte 0x10072d7 > 0x000000400ccfa61c: .4byte 0xe1040d7 ;*invokestatic unaryOp {reexecute=0 rethrow=0 return_oop=0} > ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 91 (line 684) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 273) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 41) > ; - Int256VectorTests::NEGInt256VectorTests at 73 (line 5216) > > > PS: `0x10072d7/0xe1040d7` are the machine code for `vsetvli/vrsub`. > > After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [4]. > > In the meantime, I also add an assembly pseudoinstruction `vneg.v` in macroAssembler_riscv. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#111-vector-single-width-integer-add-and-subtract > [2] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java > [3] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java > [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10880 From dholmes at openjdk.org Fri Nov 4 02:28:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Nov 2022 02:28:37 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v9] In-Reply-To: <m7cjzZhyItkm0kZxy1xOd5je44p0YVZ2ecV8fc4O5cc=.5e9dd8af-86bd-47de-809b-23dd9f8cc126@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <m7cjzZhyItkm0kZxy1xOd5je44p0YVZ2ecV8fc4O5cc=.5e9dd8af-86bd-47de-809b-23dd9f8cc126@github.com> Message-ID: <7O2UW4OR4tHoZQvToYmfbLUwO10O8V8GkOXtFfnm2HE=.0fe15762-e053-47d0-81bd-50bda3ce3684@github.com> On Thu, 3 Nov 2022 14:50:49 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 <Concat0.s/Ljava/lang/String;> >> 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> >> BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 <Concat0.d/Ljava/lang/String;> >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed gtest A couple of copyright notice issues need fixing. Thanks. src/hotspot/share/oops/cpCache.cpp line 2: > 1: /* > 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. You need to restore the 1998 copyright year here. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10860 From dholmes at openjdk.org Fri Nov 4 02:28:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Nov 2022 02:28:38 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v6] In-Reply-To: <zJscKEkFsKwwe19iTWRsXnzYbRFmgu53aY7WQS5fKi8=.11639379-71c2-408e-96c1-8ab3e3ed2fc3@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <zKINLOkBVpUwjlgj2aznBnSpZLI5z81EsUTC6xCvx-c=.40b5f108-8cb6-402f-92d0-5a6510271a3b@github.com> <zJscKEkFsKwwe19iTWRsXnzYbRFmgu53aY7WQS5fKi8=.11639379-71c2-408e-96c1-8ab3e3ed2fc3@github.com> Message-ID: <UNSAHzSbioaSnZ4dMqAzdXdWqYaLKTvbXd424V6BlLA=.d4ab5a2c-406e-4a83-8346-c380bd64357c@github.com> On Tue, 1 Nov 2022 20:59:51 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Matias Saavedra Silva has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Added gtest > > test/hotspot/gtest/oops/test_cpCache_output.cpp line 3: > >> 1: /* >> 2: * Copyright (c) 2016, 2022, Oracle and/or its affiliates. All rights reserved. >> 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > > Copyright should be 2022 only (the command is required after 2022). > > > * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. This has not been fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From duke at openjdk.org Fri Nov 4 02:41:27 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 4 Nov 2022 02:41:27 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <Bprfi_19Qo6S2VnO8ctqsPK9DTv0Fc7DzfqJTjWYJG8=.6fca8010-279c-4ca5-ad06-55ed23fa40ef@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. I think I figured out the reason for crashes with `SharedStringsStress.java` and `GCSharedStringsDuringDump.java` tests. In the implementation for G1 the regions are allocated one after the other, without taking into account the gap that may have existed between the regions at the dump time. This can result in an object spanning across two G1 regions. For instance, if there are two regions at dump time (denoted by `X` and `Y`) and the first region occupies one G1 region R1 and the second region occupies two G1 regions - R2 and R3, then they would be represented at dump time as: R1 R2 R3 R4 |XXXXXX |YYYYYYYYYY|YYYYYYYYYY| The blank space towards the end of R1 denotes the region is not fully occupied. At run time these would be mapped as follows: R1 R2 R3 R4 |XXXXXXYYYY|YYYYYYYYYY|YYYYYY | At dump time the objects are all within a region. But at run time, it is possible for an object near the region boundary to overflow into next region. If that happens we would the assertion `assert(next_addr == top())` in `HeapRegion::update_bot()`. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From duke at openjdk.org Fri Nov 4 02:46:59 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 4 Nov 2022 02:46:59 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <wNa6TlSG6gHgqtkTk47_RzVRnr0R_mRz-8jq4hLRmlU=.c4cf6e17-1877-4c44-a148-9838d44e4ebd@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. I am thinking this situation of object overflowing to next region can happen in earlier G1 implementation as well if the G1 region size is different at run time than at dump time. Taking the above example, if the G1 region size at run time is 1.5x the region size at dump time, then we would end up with some thing like this: R1 R2 R3 |XXXXXX YYYYY|YYYYYYYYYYYYYYY| So there is a chance that the object towards the end of R1 may overflow into R2. Unless I missed something, it looks like this is a potential issue in earlier implementation as well. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From duke at openjdk.org Fri Nov 4 03:20:11 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 03:20:11 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge remote-tracking branch 'origin/master' into avx512-poly - address Jamil's review - invalidkeyexception and some review comments - extra whitespace character - assembler checks and test case fixes - Merge remote-tracking branch 'origin/master' into avx512-poly - Merge remote-tracking branch 'origin' into avx512-poly - further restrict UsePolyIntrinsics with supports_avx512vlbw - missed white-space fix - - Fix whitespace and copyright statements - Add benchmark - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c ------------- Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=06 Stats: 1852 lines in 32 files changed: 1815 ins; 3 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 4 03:24:33 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 03:24:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <0xJMPRdK0h3UJBYxqeLMfp1baL8xoaUpNcAZOtrFLKo=.d5c1020e-9e61-4800-bb52-9adbdd17e19f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <rWyVuKzBYxwlH2VD7NTEhn3RmEkPD7Y1xZxLdzRC9PU=.afb331b2-0940-46ff-8801-2debf135cdc9@github.com> <ciNkfW7vps_NlMo48k2iiEF-n3eODjTkMXBnh5E6UOs=.d01b33ec-3807-4738-a155-4d8dad3bd67a@github.com> <ho1jejeOhwvS4iR99KyneWiWr_2a9OZCNfQb-3itkV8=.bb6ecc9c-5dfa-450f-b886-a33118c40db4@github.com> <0xJMPRdK0h3UJBYxqeLMfp1baL8xoaUpNcAZOtrFLKo=.d5c1020e-9e61-4800-bb52-9adbdd17e19f@github.com> Message-ID: <NssSsJxvYeCOKX6UGg_bACpV4eiFeJMBWjpceF22X_o=.c42e7e49-ac46-4ca1-a267-60ec0fc129dd@github.com> On Fri, 28 Oct 2022 21:55:59 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> I flipped-flopped on this.. I already had the code for the exception.. and already described the potential fix. So rather then remove the code, pushed the described fix. Its always easier to remove the extra field I added. Let me know what you think about the 'backdoor' field. > > Well, what you're doing achieves what we're looking for, thanks for making that change. I think I'd like to see that value set on construction and not be mutable from outside the object. Something like this: > > - place a `private final boolean checkWeakKey` up near where all the other fields are defined. > - the no-args Poly1305 is implemented as `this(true)` > - an additional constructor is created `Poly1305(boolean checkKey)` which sets `checkWeakKey` true or false as provided by the parameter. > - in setRSVals you should be able to wrap lines 296-310 inside a single `if (checkWeakKey)` block. > - In the Poly1305KAT the `new Poly1305()` becomes `new Poly1305(false)`. done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Fri Nov 4 03:28:54 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 03:28:54 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v4] In-Reply-To: <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> Message-ID: <ZTuP8PiDBpc392_QaWT8Rvdiy3jaw-G96uNCnvfdRxo=.bbb0d931-0b0e-4d1e-b19f-d5ff038d41ac@github.com> On Thu, 3 Nov 2022 16:36:49 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - fixup! remove dead code > - remove dead code Updated changes looks good except for the indentation issue. src/hotspot/os_cpu/linux_riscv/prefetch_linux_riscv.inline.hpp line 32: > 30: > 31: inline void Prefetch::read(const void *loc, intx interval) { > 32: if (interval >= 0 && UseZicbop) { You might want to fix indentation for Prefetch::read/write here. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10884 From duke at openjdk.org Fri Nov 4 03:54:13 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 03:54:13 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> Message-ID: <gAD-wSexeb2a5tq1LEFJqWRS6QF0Fi8EhAxZ1L5cBRc=.9f93ba53-64f2-4965-a1cc-323c7107e08c@github.com> On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - address Jamil's review > - invalidkeyexception and some review comments > - extra whitespace character > - assembler checks and test case fixes > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Merge remote-tracking branch 'origin' into avx512-poly > - further restrict UsePolyIntrinsics with supports_avx512vlbw > - missed white-space fix > - - Fix whitespace and copyright statements > - Add benchmark > - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c @jnimeh Hopefully last change addresses your pending comments. More data, new data... datasize | master | optimized | disabled | opt/mst | dis/mst -- | -- | -- | -- | -- | -- 32 | 3218169 | 3476352 | 3126538 | 1.08 | 0.97 64 | 2858030 | 3391015 | 2846735 | 1.19 | 1.00 128 | 2396796 | 3239888 | 2406931 | 1.35 | 1.00 256 | 1780679 | 3063749 | 1765664 | 1.72 | 0.99 512 | 1168824 | 2918524 | 1153009 | 2.50 | 0.99 1024 | 648772.1 | 2716787 | 688467.7 | 4.19 | 1.06 2048 | 357009 | 2382723 | 376023.7 | 6.67 | 1.05 16384 | 48854.33 | 896850 | 53104.68 | 18.36 | 1.09 1048576 | 771.461 | 15088.63 | 846.247 | 19.56 | 1.10 src/hotspot/share/opto/library_call.cpp line 7016: > 7014: Node* rObj = new CheckCastPPNode(control(), rFace, rtype); > 7015: rObj = _gvn.transform(rObj); > 7016: Node* rlimbs = load_field_from_object(rObj, "limbs", "[J"); @jnimeh if you could be particularly 'critical' here please? I generally know what I wanted to accomplish. And stepped through things with a debugger... but all the various IR types and conversions, I just don't know. I copied things from AES, which seem to work, as they do here, but I don't _understand_ the code. i.e. recursive `getfield`s `((IntegerPolynomial$MutableElement)(this.a)).limbs` plus checks if we know field offsets: if (recursive) classes are loaded. But if not loaded, crashing with assert? Seems 'rude'. I think Poly1305 class constructor running would had forced the classes here to load so nothing to worry about, so I suppose assert is enough.) src/hotspot/share/opto/library_call.cpp line 7027: > 7025: // Node* cmp = _gvn.transform(new CmpINode(load_array_length(alimbs), intcon(5))); > 7026: // Node* bol = _gvn.transform(new BoolNode(cmp, BoolTest::eq)); > 7027: // Node* if_eq = generate_slow_guard(bol, slow_region); @jnimeh I had "valiantly" tried to do a length check here, but couldn't find where to steal code from! If you have some suggestions... Meanwhile, I decided that perhaps a Java check would not be _that_ bad for non-intrinsic code. See `checkLimbsForIntrinsic`; I had to change the interface `IntegerModuloP` which initially felt like a hack. But perhaps the java check is 'alright', reminds java developer that there is a related intrinsic. test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305IntrinsicFuzzTest.java line 39: > 37: public static void main(String[] args) throws Exception { > 38: //Note: it might be useful to increase this number during development of new Poly1305 intrinsics > 39: final int repeat = 100; @jnimeh FYI... In case you end up doing supporting other architectures, left a trail (and lots of 'math' comments in the assembler) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Fri Nov 4 05:04:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 05:04:03 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v4] In-Reply-To: <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> Message-ID: <I8SENNacs3QgWAkim39Ay7hsqOdOZ0-gcEuFuSIC6SA=.31845e8c-95b6-4e6c-acc9-b37521342706@github.com> On Thu, 3 Nov 2022 16:36:49 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - fixup! remove dead code > - remove dead code src/hotspot/cpu/riscv/riscv.ad line 5190: > 5188: > 5189: instruct prefetchalloc( memory mem ) %{ > 5190: match(PrefetchAllocation mem); PS: Should we also put this under control of option UseZicbop like you do in Prefetch::read/write? Did you checked whether those (prefetchalloc and Prefetch::read/write) will ever be used/called when AllocatePrefetchStyle is 0 (which is the case when UseZicbop is false). ------------- PR: https://git.openjdk.org/jdk/pull/10884 From iklam at openjdk.org Fri Nov 4 05:18:27 2022 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 4 Nov 2022 05:18:27 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. I am not sure if the existing implementation is 100% correct, but for these test cases, I think we are probably saved by this code: if (!is_aligned(relocated_closed_heap_region_bottom, HeapRegion::GrainBytes)) { // Align the bottom of the closed archive heap regions at G1 region boundary. // This will avoid the situation where the highest open region and the lowest // closed region sharing the same G1 region. Otherwise we will fail to map the // open regions. size_t align = size_t(relocated_closed_heap_region_bottom) % HeapRegion::GrainBytes; delta -= align; log_info(cds)("CDS heap data needs to be relocated lower by a further " SIZE_FORMAT " bytes to " INTX_FORMAT " to be aligned with HeapRegion::GrainBytes", align, delta); set_shared_heap_runtime_delta(delta); relocated_closed_heap_region_bottom = heap_region_runtime_start_address(si); _heap_pointers_need_patching = true; } G1 regions are at least 1MB, and are always a power of 2. By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail. - TestCommon.concat(vmOptionsPrefix, "HelloString")); + TestCommon.concat(vmOptionsPrefix, "-Xlog:cds=debug", "-Xmx6g", "HelloString")); In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors. (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs). Currently, when writing the archived heap, we allocate a G1 region and write objects into it, from bottom to top. When it fills up, we allocate another G1 region that's immediately below, and start filing it from bottom to top. At the end, we merge all the fully-filled regions into the CA0 region, and make the last, half-filled region CA1. (Same for the OA0, OA1 regions, but usually the OA0 region never has more than 1MB objects, so we'd never have the OA1 region). This is kind of kludgy. We should be able to first determine all objects to be archived, and then write them out a single contiguous "closed" region, and a single contiguous "open" region. When filling out these regions, we can pack the objects so that they will never cross a 1MB boundary. Also, I think it may not even be worthwhile to have the "closed" region and treat it specially at runtime. We can have just a single contiguous block of archived objects like this, where S are the String objects and their char arrays, and O are the other types of objects OOOOOOOOOOOSSSSSSSSSSS At runtime, we allocate enough G1 regions from the top of the heap to accommodate the archived objects, and put a dummy object at the bottom to fix the bottom-most region. (The reason we align the archived regions to the top of the G1 heap is the top of the heap usually have the same narrowOop for various heap sizes, so we can usually avoid patching the embedded oop pointers. This is a trade off with other collectors, which may not allow you to start allocating memory from the top. We may want to reconsider this.) All the Strings are always in the interned table so they will never be collected. Also, we already computed their hashcode, so they are never written into (unless you `synchronize` on them at runtime). So for the region(s) that contain only the S objects, we can effectively share the memory across multiple processes, and the GC will never collect them. Anyway, we usually just have a few MBs of archived objects, so it may not matter whether we keep them immutable or not. ******* I want to thank you for starting working in this area. Going forward, I think we need more discussion and design before we can decide exactly what to do. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From fyang at openjdk.org Fri Nov 4 05:23:27 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 05:23:27 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v3] In-Reply-To: <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> Message-ID: <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> On Thu, 3 Nov 2022 21:26:52 GMT, Dean Long <dlong at openjdk.org> wrote: >> I think aarch64 and riscv are different from x86_64 here due to possible padding in the frame[1][2]. >> So if we modify this assertion like: >> >> diff --git a/src/hotspot/share/runtime/continuationFreezeThaw.cpp b/src/hotspot/share/runtime/continuationFreezeThaw.cpp >> index 2ef48618ccb..c8e88b67f94 100644 >> --- a/src/hotspot/share/runtime/continuationFreezeThaw.cpp >> +++ b/src/hotspot/share/runtime/continuationFreezeThaw.cpp >> @@ -1050,7 +1050,7 @@ NOINLINE freeze_result FreezeBase::recurse_freeze_interpreted_frame(frame& f, fr >> const int fsize = f.fp() + frame::metadata_words + locals - stack_frame_top; >> >> intptr_t* const stack_frame_bottom = ContinuationHelper::InterpretedFrame::frame_bottom(f); >> - assert(stack_frame_bottom - stack_frame_top >= fsize, ""); // == on x86 >> + assert(stack_frame_bottom - stack_frame_top == fsize, ""); // == on x86 >> >> DEBUG_ONLY(verify_frame_top(f, stack_frame_top)); >> >> >> Then we will trigger assertion failure on linux-aarch64 running a simple virtual thread demo: >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/realfyang/openjdk-jdk/src/hotspot/share/runtime/continuationFreezeThaw.cpp >> :1053), pid=2680946, tid=2680964 >> # Error: assert(stack_frame_bottom - stack_frame_top == fsize) failed >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (slowdebug build 20-internal-adhoc.realfyang.open >> jdk-jdk) >> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, >> sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # V [libjvm.so+0x865c98] FreezeBase::recurse_freeze_interpreted_frame(frame&, frame&, int, bool)+ >> 0xb8 >> # >> >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/frame_aarch64.hpp#L62 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/frame_riscv.hpp#L62 > > OK, skipping the assert for now. Regarding `NOT_RISCV64(+ frame::metadata_words)` @reinrich just posted the PR for the PPC64 port (see JDK-8286302), which introduces metadata_words_at_top/metadata_words_at_bottom. I'm not sure what values RISC-V would use for these new constants, but merging with the PPC64 changes might allow a platform-independent setting of fsize here. Hi, I went through the PPC64 changes to shared code and I think for RISC-V we should define metadata_words_at_top and metadata_words_at_bottom to 0 and 2 (which equals metadata_words as defined by this PR) respectively. The PPC64 changes will help eliminate the RISC-V-specific change made in stackChunkOopDesc::is_usable_in_chunk in file src/hotspot/share/oops/stackChunkOop.inline.hpp. But we might still need a RISC-V platform-dependent change here due to the particularity of the RISC-V frame structure. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Fri Nov 4 06:27:39 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 06:27:39 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Code cleanup - Merge branch 'master' into 8286301 - Merge branch 'master' into 8286301 - Fix - 8286301: JEP 425 to RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10917/files - new: https://git.openjdk.org/jdk/pull/10917/files/06302f9d..850c8958 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=02-03 Stats: 3921 lines in 192 files changed: 2334 ins; 976 del; 611 mod Patch: https://git.openjdk.org/jdk/pull/10917.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10917/head:pull/10917 PR: https://git.openjdk.org/jdk/pull/10917 From yadongwang at openjdk.org Fri Nov 4 08:30:38 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 4 Nov 2022 08:30:38 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v4] In-Reply-To: <I8SENNacs3QgWAkim39Ay7hsqOdOZ0-gcEuFuSIC6SA=.31845e8c-95b6-4e6c-acc9-b37521342706@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> <I8SENNacs3QgWAkim39Ay7hsqOdOZ0-gcEuFuSIC6SA=.31845e8c-95b6-4e6c-acc9-b37521342706@github.com> Message-ID: <l79EUQ77p5jB19rocBm-4V3663GHoH842J4pD_sZsYM=.140ad281-7b99-47dd-92e2-aa71d59ee1e4@github.com> On Fri, 4 Nov 2022 05:00:36 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixup! remove dead code >> - remove dead code > > src/hotspot/cpu/riscv/riscv.ad line 5190: > >> 5188: >> 5189: instruct prefetchalloc( memory mem ) %{ >> 5190: match(PrefetchAllocation mem); > > PS: Should we also put this under control of option UseZicbop like you do in Prefetch::read/write? Did you checked whether those (prefetchalloc and Prefetch::read/write) will ever be used/called when AllocatePrefetchStyle is 0 (which is the case when UseZicbop is false). The generation of PrefetchAllocationNode is controlled by AllocatePrefetchStyle. It should be fine if UseZicbop is associated with this option. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From fyang at openjdk.org Fri Nov 4 08:30:39 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 08:30:39 GMT Subject: RFR: 8295967: RISC-V: Support negVI/negVL instructions for Vector API In-Reply-To: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> References: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> Message-ID: <TyO30GtbiL2KENM8cguCXmC0Q4JprHurQQlq9yXK-CU=.2b9bba4e-9a13-43a1-af06-e2d7ff8013cc@github.com> On Thu, 27 Oct 2022 05:38:03 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: > Hi, > > This patch will add support of `NegVI`, `NegVL` for RISC-V and was implemented by referring to riscv-v-spec v1.0 [1]. > > Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test cases[2] [3] , the compilation log is as follows: > > > 100 B16: # out( B37 B17 ) <- in( B15 ) Freq: 77.0109 > 100 # castII of R9, #@castII > 100 addw R29, R9, zr #@convI2L_reg_reg > 104 slli R29, R29, (#2 & 0x3f) #@lShiftL_reg_imm > 108 add R12, R30, R29 # ptr, #@addP_reg_reg > 10c addi R12, R12, #16 # ptr, #@addP_reg_imm > 110 vle V1, [R12] #@loadV > 118 vrsub.vx V1, V1, V1 #@vnegI > 120 bgeu R9, R10, B37 #@cmpU_branch P=0.000001 C=-1.000000 > > > At the same time, the following assembly code will be generated: > > > 0x000000400ccfa618: .4byte 0x10072d7 > 0x000000400ccfa61c: .4byte 0xe1040d7 ;*invokestatic unaryOp {reexecute=0 rethrow=0 return_oop=0} > ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 91 (line 684) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 273) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 41) > ; - Int256VectorTests::NEGInt256VectorTests at 73 (line 5216) > > > PS: `0x10072d7/0xe1040d7` are the machine code for `vsetvli/vrsub`. > > After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [4]. > > In the meantime, I also add an assembly pseudoinstruction `vneg.v` in macroAssembler_riscv. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#111-vector-single-width-integer-add-and-subtract > [2] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java > [3] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java > [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10880 From jiefu at openjdk.org Fri Nov 4 08:36:06 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 4 Nov 2022 08:36:06 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> Message-ID: <WuyZNKkc9GrO5QkM5Hq4Wi4nwjEBFlC8Am5Un2FaAqE=.0921f561-76a7-4504-bcbf-1b86380747c2@github.com> On Fri, 4 Nov 2022 06:27:39 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Code cleanup > - Merge branch 'master' into 8286301 > - Merge branch 'master' into 8286301 > - Fix > - 8286301: JEP 425 to RISC-V The shared code change looks good to me. I'm not sure if it's possible to eliminate all the platform-dependent code. But the current version looks good enough to me. Thanks. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.org/jdk/pull/10917 From rrich at openjdk.org Fri Nov 4 08:57:25 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Nov 2022 08:57:25 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> Message-ID: <wxLlwo9jvWkFoA1EvJxMin-AE_F3tVD0w1yYHswHL8U=.20730d5c-c017-4559-8a20-8d132392d59d@github.com> On Fri, 4 Nov 2022 05:19:32 GMT, Fei Yang <fyang at openjdk.org> wrote: >> OK, skipping the assert for now. Regarding `NOT_RISCV64(+ frame::metadata_words)` @reinrich just posted the PR for the PPC64 port (see JDK-8286302), which introduces metadata_words_at_top/metadata_words_at_bottom. I'm not sure what values RISC-V would use for these new constants, but merging with the PPC64 changes might allow a platform-independent setting of fsize here. > > Hi, I went through the PPC64 changes to shared code and I think for RISC-V we should define metadata_words_at_top and metadata_words_at_bottom to 0 and 2 (which equals metadata_words as defined by this PR) respectively. The PPC64 changes will help eliminate the RISC-V-specific change made in stackChunkOopDesc::is_usable_in_chunk in file src/hotspot/share/oops/stackChunkOop.inline.hpp. But we might still need a RISC-V platform-dependent change here due to the particularity of the RISC-V frame structure. Thanks. The value for `fsize` as it is calculated is correct on PPC64 but only because errors compensate. The calculation makes assumptions about the position of fp which is not specified. This is the assumed layout: : : : : | | |-----------------| | | | locals array | | |<- callers_SP =================== | | | metadata at bottom | | |<- FP |-----------------| | | | | | | | | | |<- SP =================== Here the metadata lies outside [FP, SP] and needs to be added to FP-SP just like the size of the locals array. This coincidentally matches the layout on PPC64: : : : : | | |-----------------| | | | locals array | | | |-----------------| | | | metadata at top | | |<- FP / callers_SP =================== | | | | | | | | |-----------------| | | | metadata at top | | |<- SP =================== I assume the layout on Risc-V is like this: : : : : | | |-----------------| | | | locals array | | |<- FP / callers_SP =================== | | | metadata at bottom | | | |-----------------| | | | | | | | | | |<- SP =================== Here the metadata is included in [FP, SP] We could change the line to ```c++ const int fsize = ContinuationHelper::InterpretedFrame::callers_sp(f) + frame::metadata_words_at_top + locals - stack_frame_top; with a platform dependent implementation of `callers_sp()` ------------- PR: https://git.openjdk.org/jdk/pull/10917 From dzhang at openjdk.org Fri Nov 4 08:59:30 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 4 Nov 2022 08:59:30 GMT Subject: RFR: 8295967: RISC-V: Support negVI/negVL instructions for Vector API In-Reply-To: <vO5mEZf7a5P6lcAuQi7xv_6AzVGyLp4YZ8DJUnOs0y8=.5724aa6c-c5bd-4ddf-af46-0f0de377159b@github.com> References: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> <vO5mEZf7a5P6lcAuQi7xv_6AzVGyLp4YZ8DJUnOs0y8=.5724aa6c-c5bd-4ddf-af46-0f0de377159b@github.com> Message-ID: <vHkGCwsZRuIHSwYOTxCVDH6o1tghGpj4XIkplf96si8=.9fbb7ce5-f301-4226-a291-43842a985c89@github.com> On Fri, 4 Nov 2022 02:25:11 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> Hi, >> >> This patch will add support of `NegVI`, `NegVL` for RISC-V and was implemented by referring to riscv-v-spec v1.0 [1]. >> >> Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test cases[2] [3] , the compilation log is as follows: >> >> >> 100 B16: # out( B37 B17 ) <- in( B15 ) Freq: 77.0109 >> 100 # castII of R9, #@castII >> 100 addw R29, R9, zr #@convI2L_reg_reg >> 104 slli R29, R29, (#2 & 0x3f) #@lShiftL_reg_imm >> 108 add R12, R30, R29 # ptr, #@addP_reg_reg >> 10c addi R12, R12, #16 # ptr, #@addP_reg_imm >> 110 vle V1, [R12] #@loadV >> 118 vrsub.vx V1, V1, V1 #@vnegI >> 120 bgeu R9, R10, B37 #@cmpU_branch P=0.000001 C=-1.000000 >> >> >> At the same time, the following assembly code will be generated: >> >> >> 0x000000400ccfa618: .4byte 0x10072d7 >> 0x000000400ccfa61c: .4byte 0xe1040d7 ;*invokestatic unaryOp {reexecute=0 rethrow=0 return_oop=0} >> ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 91 (line 684) >> ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 273) >> ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 41) >> ; - Int256VectorTests::NEGInt256VectorTests at 73 (line 5216) >> >> >> PS: `0x10072d7/0xe1040d7` are the machine code for `vsetvli/vrsub`. >> >> After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [4]. >> >> In the meantime, I also add an assembly pseudoinstruction `vneg.v` in macroAssembler_riscv. >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#111-vector-single-width-integer-add-and-subtract >> [2] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java >> [3] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java >> [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu >> - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu > > lgtm @yadongw @RealFYang Thanks for the review! ------------- PR: https://git.openjdk.org/jdk/pull/10880 From dzhang at openjdk.org Fri Nov 4 09:11:07 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 4 Nov 2022 09:11:07 GMT Subject: Integrated: 8295967: RISC-V: Support negVI/negVL instructions for Vector API In-Reply-To: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> References: <V3-BBH51-VglKSs5HqRVGaH07heklN9woQDk_Myh3bs=.1bf0112a-8428-47d9-ab83-0b0ff6f14d8c@github.com> Message-ID: <O5j39wIHjuyw4uhPvJij5OBuUzQDk9cCJmt7OgaZ2pU=.f67406ea-7c8a-450f-93a1-e4499820644e@github.com> On Thu, 27 Oct 2022 05:38:03 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: > Hi, > > This patch will add support of `NegVI`, `NegVL` for RISC-V and was implemented by referring to riscv-v-spec v1.0 [1]. > > Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test cases[2] [3] , the compilation log is as follows: > > > 100 B16: # out( B37 B17 ) <- in( B15 ) Freq: 77.0109 > 100 # castII of R9, #@castII > 100 addw R29, R9, zr #@convI2L_reg_reg > 104 slli R29, R29, (#2 & 0x3f) #@lShiftL_reg_imm > 108 add R12, R30, R29 # ptr, #@addP_reg_reg > 10c addi R12, R12, #16 # ptr, #@addP_reg_imm > 110 vle V1, [R12] #@loadV > 118 vrsub.vx V1, V1, V1 #@vnegI > 120 bgeu R9, R10, B37 #@cmpU_branch P=0.000001 C=-1.000000 > > > At the same time, the following assembly code will be generated: > > > 0x000000400ccfa618: .4byte 0x10072d7 > 0x000000400ccfa61c: .4byte 0xe1040d7 ;*invokestatic unaryOp {reexecute=0 rethrow=0 return_oop=0} > ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 91 (line 684) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 273) > ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 41) > ; - Int256VectorTests::NEGInt256VectorTests at 73 (line 5216) > > > PS: `0x10072d7/0xe1040d7` are the machine code for `vsetvli/vrsub`. > > After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [4]. > > In the meantime, I also add an assembly pseudoinstruction `vneg.v` in macroAssembler_riscv. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#111-vector-single-width-integer-add-and-subtract > [2] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java > [3] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java > [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu This pull request has now been integrated. Changeset: c116ae75 Author: Dingli Zhang <dzhang at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c116ae75a7d1cdad82a451152fef5e4233fe19d6 Stats: 29 lines in 3 files changed: 29 ins; 0 del; 0 mod 8295967: RISC-V: Support negVI/negVL instructions for Vector API Reviewed-by: yadongwang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/10880 From fyang at openjdk.org Fri Nov 4 09:24:31 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 09:24:31 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <wxLlwo9jvWkFoA1EvJxMin-AE_F3tVD0w1yYHswHL8U=.20730d5c-c017-4559-8a20-8d132392d59d@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> <wxLlwo9jvWkFoA1EvJxMin-AE_F3tVD0w1yYHswHL8U=.20730d5c-c017-4559-8a20-8d132392d59d@github.com> Message-ID: <zGLXygsH37HDIQgLY2PAxX8Vf1bcnC8rQQzM6JvVGRE=.f0587c1f-23be-462a-af45-5183867ebb48@github.com> On Fri, 4 Nov 2022 08:53:35 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Hi, I went through the PPC64 changes to shared code and I think for RISC-V we should define metadata_words_at_top and metadata_words_at_bottom to 0 and 2 (which equals metadata_words as defined by this PR) respectively. The PPC64 changes will help eliminate the RISC-V-specific change made in stackChunkOopDesc::is_usable_in_chunk in file src/hotspot/share/oops/stackChunkOop.inline.hpp. But we might still need a RISC-V platform-dependent change here due to the particularity of the RISC-V frame structure. Thanks. > > The value for `fsize` as it is calculated is correct on PPC64 but only because errors compensate. The calculation makes assumptions about the position of fp which is not specified. > > This is the assumed layout: > > > : : > : : > | | > |-----------------| > | | > | locals array | > | |<- callers_SP > =================== > | | > | metadata at bottom | > | |<- FP > |-----------------| > | | > | | > | | > | | > | |<- SP > =================== > > > Here the metadata lies outside [FP, SP] and needs to be added to FP-SP just like the size of the locals array. > > This coincidentally matches the layout on PPC64: > > > : : > : : > | | > |-----------------| > | | > | locals array | > | | > |-----------------| > | | > | metadata at top | > | |<- FP / callers_SP > =================== > | | > | | > | | > | | > |-----------------| > | | > | metadata at top | > | |<- SP > =================== > > > > I assume the layout on Risc-V is like this: > > > : : > : : > | | > |-----------------| > | | > | locals array | > | |<- FP / callers_SP > =================== > | | > | metadata at bottom | > | | > |-----------------| > | | > | | > | | > | | > | |<- SP > =================== > > Here the metadata is included in [FP, SP] > > > We could change the line to > > ```c++ > const int fsize = ContinuationHelper::InterpretedFrame::callers_sp(f) + frame::metadata_words_at_top + locals - stack_frame_top; > > > with a platform dependent implementation of `callers_sp()` @reinrich : Yes, you are right about the frame layout of RISC-V. And your proposed solution looks reasonable to me. Since that will need testing accross all platforms, maybe we can do that as a separate PR later on after this is merged. The RISCV testing platform is relatively slow and this PR has been tested for a long time. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From stefank at openjdk.org Fri Nov 4 09:36:45 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Nov 2022 09:36:45 GMT Subject: RFR: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <yOdT19VYdyTyWntYGcpDMt0ydhwFNnV1Kfu4mZ1ek5I=.9de0a960-cb07-40b8-b3f5-9e154b57319a@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. Thanks for the reviews and the discussion around future alternatives around CHeap allocations. I'm pushing this fix as is and I expect that we'll continue talking allocation strategies in other threads/PRs. ------------- PR: https://git.openjdk.org/jdk/pull/10948 From stefank at openjdk.org Fri Nov 4 09:40:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Nov 2022 09:40:30 GMT Subject: Integrated: 8296231: Fix MEMFLAGS for CHeapBitMaps In-Reply-To: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> References: <wNxQGd-5iAY61VlRl7Y2mPKrCDtfjW3UHY9u2-GhDeI=.8de3f771-9ff2-46e0-b99d-e68003ea4b71@github.com> Message-ID: <P2khQfR_znBIu70SCBPknFtIuVpWIW3bebVSeihnmzg=.0e4c8d1a-c620-4f06-938b-47be01ff335b@github.com> On Wed, 2 Nov 2022 14:24:07 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Some usages of CHeapBitMaps rely on the default value of the MEMFLAGS argument (mtInternal). This is undesirable, and should be fixed. > > I'd prefer to remove the default value, but there is currently a PR touching the BitMap classes, so I'd like to limit this Bug to only fixing the incorrect usage of mtInternal. This pull request has now been integrated. Changeset: 8ee0f7d5 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8ee0f7d5982d95674cfc1b217dbabaeafefbc8f1 Stats: 27 lines in 8 files changed: 13 ins; 1 del; 13 mod 8296231: Fix MEMFLAGS for CHeapBitMaps Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/10948 From rrich at openjdk.org Fri Nov 4 09:41:37 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Nov 2022 09:41:37 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <zGLXygsH37HDIQgLY2PAxX8Vf1bcnC8rQQzM6JvVGRE=.f0587c1f-23be-462a-af45-5183867ebb48@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> <wxLlwo9jvWkFoA1EvJxMin-AE_F3tVD0w1yYHswHL8U=.20730d5c-c017-4559-8a20-8d132392d59d@github.com> <zGLXygsH37HDIQgLY2PAxX8Vf1bcnC8rQQzM6JvVGRE=.f0587c1f-23be-462a-af45-5183867ebb48@github.com> Message-ID: <s79CAgujbi8hgIl0ElRkTeO62lC9lr_gIw6HFchPMNg=.863df652-bc47-41a7-863f-81dd623907e2@github.com> On Fri, 4 Nov 2022 09:20:52 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The value for `fsize` as it is calculated is correct on PPC64 but only because errors compensate. The calculation makes assumptions about the position of fp which is not specified. >> >> This is the assumed layout: >> >> >> : : >> : : >> | | >> |-----------------| >> | | >> | locals array | >> | |<- callers_SP >> =================== >> | | >> | metadata at bottom | >> | |<- FP >> |-----------------| >> | | >> | | >> | | >> | | >> | |<- SP >> =================== >> >> >> Here the metadata lies outside [FP, SP] and needs to be added to FP-SP just like the size of the locals array. >> >> This coincidentally matches the layout on PPC64: >> >> >> : : >> : : >> | | >> |-----------------| >> | | >> | locals array | >> | | >> |-----------------| >> | | >> | metadata at top | >> | |<- FP / callers_SP >> =================== >> | | >> | | >> | | >> | | >> |-----------------| >> | | >> | metadata at top | >> | |<- SP >> =================== >> >> >> >> I assume the layout on Risc-V is like this: >> >> >> : : >> : : >> | | >> |-----------------| >> | | >> | locals array | >> | |<- FP / callers_SP >> =================== >> | | >> | metadata at bottom | >> | | >> |-----------------| >> | | >> | | >> | | >> | | >> | |<- SP >> =================== >> >> Here the metadata is included in [FP, SP] >> >> >> We could change the line to >> >> ```c++ >> const int fsize = ContinuationHelper::InterpretedFrame::callers_sp(f) + frame::metadata_words_at_top + locals - stack_frame_top; >> >> >> with a platform dependent implementation of `callers_sp()` > > @reinrich : Yes, you are right about the frame layout of RISC-V. And your proposed solution looks reasonable to me. Since that will need testing accross all platforms, maybe we can do that as a separate PR later on after this is merged. The RISCV testing platform is relatively slow and this PR has been tested for a long time. Thanks. @RealFYang I'm ok with leaving this for later. The change is actually rather small. I've added it to the PPC64 port with https://github.com/openjdk/jdk/pull/10961/commits/0d12b0577d8314ed1d571d9cac65fa8866939180 Quick tests succeeded on X86, AARCH64 and PPC64. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From luhenry at openjdk.org Fri Nov 4 10:04:05 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 4 Nov 2022 10:04:05 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v5] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <7DjEPRqC-QZmkbyyt6bNfgqDVVxjfWat5nUmtJg5iiE=.fac3d0a1-02a5-46fb-9c30-5f11fddec552@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/48ab8f31..68362966 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=03-04 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Fri Nov 4 10:04:08 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 4 Nov 2022 10:04:08 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v4] In-Reply-To: <l79EUQ77p5jB19rocBm-4V3663GHoH842J4pD_sZsYM=.140ad281-7b99-47dd-92e2-aa71d59ee1e4@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <YhdnZ-IdjKAnIrjsv__vuAuxMoemXVp9pALILaVLffs=.732b1b3c-2044-4e61-bdb7-dad6dd83549f@github.com> <I8SENNacs3QgWAkim39Ay7hsqOdOZ0-gcEuFuSIC6SA=.31845e8c-95b6-4e6c-acc9-b37521342706@github.com> <l79EUQ77p5jB19rocBm-4V3663GHoH842J4pD_sZsYM=.140ad281-7b99-47dd-92e2-aa71d59ee1e4@github.com> Message-ID: <ehs8qizgzLbk3nHQKTtfB5ZMQrH7n4aBQe0sldPLsJc=.a162b618-44a6-4d0e-aaff-e72bfdb064ad@github.com> On Fri, 4 Nov 2022 08:26:51 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> src/hotspot/cpu/riscv/riscv.ad line 5190: >> >>> 5188: >>> 5189: instruct prefetchalloc( memory mem ) %{ >>> 5190: match(PrefetchAllocation mem); >> >> PS: Should we also put this under control of option UseZicbop like you do in Prefetch::read/write? Did you checked whether those (prefetchalloc and Prefetch::read/write) will ever be used/called when AllocatePrefetchStyle is 0 (which is the case when UseZicbop is false). > > The generation of PrefetchAllocationNode is controlled by AllocatePrefetchStyle. It should be fine if UseZicbop is associated with this option. Added `predicate(UseZicbop)` just in case. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Fri Nov 4 10:07:04 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 4 Nov 2022 10:07:04 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/68362966..51725fd4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=04-05 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From rrich at openjdk.org Fri Nov 4 10:16:12 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Nov 2022 10:16:12 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> Message-ID: <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> On Fri, 4 Nov 2022 06:27:39 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Code cleanup > - Merge branch 'master' into 8286301 > - Merge branch 'master' into 8286301 > - Fix > - 8286301: JEP 425 to RISC-V Shared changes look good. Maybe we can get rid of the RISCV64 dependency when allocating the CodeBuffer for the enter intrinsic. Thanks, Richard. src/hotspot/share/oops/stackChunkOop.inline.hpp line 139: > 137: bool stackChunkOopDesc::is_usable_in_chunk(void* p) const { > 138: #if (defined(X86) || defined(AARCH64) || defined(RISCV64)) && !defined(ZERO) > 139: HeapWord* start = (HeapWord*)start_address() + sp() - frame::sender_sp_offset RISCV64_ONLY(- frame::metadata_words); I think this RISCV64 special case will go away with the following change form the PPC64 port: https://github.com/openjdk/jdk/pull/10961/files#diff-563a75c6d20d6f6f35362f5afef7d4f061b9536a37f7c95c27ac37879246d899R138-R139 src/hotspot/share/runtime/sharedRuntime.cpp line 3123: > 3121: > 3122: if (method->is_continuation_enter_intrinsic()) { > 3123: buffer.initialize_stubs_size(NOT_RISCV64(128) RISCV64_ONLY(192)); Can we use the maximum over all platforms? This is a temporary buffer anyway. The generated code, constants, etc. is copied to the nmethod. See https://github.com/openjdk/jdk/blob/8ee0f7d5982d95674cfc1b217dbabaeafefbc8f1/src/hotspot/share/code/nmethod.cpp#L659-L660 ------------- Changes requested by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/10917 From aph at openjdk.org Fri Nov 4 10:32:30 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 4 Nov 2022 10:32:30 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <HN_ppGL2xcGGzXWAs7-lw9HxLJ063u7gtfIYKB2Rg0I=.78843e37-397b-4e1f-867f-bbe974d4d7cf@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> <HN_ppGL2xcGGzXWAs7-lw9HxLJ063u7gtfIYKB2Rg0I=.78843e37-397b-4e1f-867f-bbe974d4d7cf@github.com> Message-ID: <w3_d62ZmgHAQs82ehx7Ie4HrEBjgA-ntmDLyyno8Ies=.3f5e5ba5-8bdc-4676-a555-404d42a32072@github.com> On Fri, 4 Nov 2022 00:02:00 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> I did see this one trigger, otherwise I wouldn't have known about it, but I can't reproduce it today. Whether it's an assert or a guarantee depends on how serious the problem would be. > > Interesting! I do hit the assert during JDK build: > > # Internal Error (.../src/hotspot/share/oops/instanceKlass.cpp:390), pid=956, tid=6147 > # Error: assert(this_key != __null) failed > > V report_vm_error(char const*, int, char const*, char const*, ...)+0x88 > V InstanceKlass::set_nest_host(InstanceKlass*)+0x254 > V SystemDictionary::load_shared_lambda_proxy_class(InstanceKlass*, Handle, Handle, PackageEntry*, JavaThread*)+0x19c > V SystemDictionaryShared::prepare_shared_lambda_proxy_class(InstanceKlass*, InstanceKlass*, JavaThread*)+0x13c > V JVM_LookupLambdaProxyClassFromArchive+0x2cc > C Java_java_lang_invoke_LambdaProxyClassArchive_findFromArchive+0x4c > j java.lang.invoke.LambdaProxyClassArchive.findFromArchive(...) java.base at 20-internal > ... > > > Looks like a pre-existing bug to me. OK! I'll do a bit more digging. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From rrich at openjdk.org Fri Nov 4 11:08:35 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Nov 2022 11:08:35 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> Message-ID: <MfKWek0dQuOOStu0LkxErqMvAlVmOl5L_oNSmPlJSCA=.c3cdb432-ceef-49a0-8475-bbb22b5e5293@github.com> On Fri, 4 Nov 2022 09:56:16 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Code cleanup >> - Merge branch 'master' into 8286301 >> - Merge branch 'master' into 8286301 >> - Fix >> - 8286301: JEP 425 to RISC-V > > src/hotspot/share/oops/stackChunkOop.inline.hpp line 139: > >> 137: bool stackChunkOopDesc::is_usable_in_chunk(void* p) const { >> 138: #if (defined(X86) || defined(AARCH64) || defined(RISCV64)) && !defined(ZERO) >> 139: HeapWord* start = (HeapWord*)start_address() + sp() - frame::sender_sp_offset RISCV64_ONLY(- frame::metadata_words); > > I think this RISCV64 special case will go away with the following change form the PPC64 port: https://github.com/openjdk/jdk/pull/10961/files#diff-563a75c6d20d6f6f35362f5afef7d4f061b9536a37f7c95c27ac37879246d899R138-R139 Also `frame::sender_sp_offset` is a platform detail that cannot be used in shared code. Since `metadata_words = sender_sp_offset` on other platforms I'd suggest you replace `frame::sender_sp_offset` with `frame::metadata_words`. Probably this has been forgotten. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Fri Nov 4 11:10:30 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 11:10:30 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> Message-ID: <EuERyXkxQLgkdzbK6IwOOIInHKaSZapy8vB6jYzEPj4=.e3da9280-fef8-4bac-a625-3bcb1600552a@github.com> On Fri, 4 Nov 2022 10:07:04 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! review Still looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10884 From fyang at openjdk.org Fri Nov 4 11:42:28 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 11:42:28 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v5] In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <MVDmwCG_U2khyoo3NdlYMjVvCzd5CKaVp99L-fKjyKM=.3e4976f1-e24b-4730-9565-45fa635770a9@github.com> > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10917/files - new: https://git.openjdk.org/jdk/pull/10917/files/850c8958..96109879 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10917.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10917/head:pull/10917 PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Fri Nov 4 11:48:33 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 11:48:33 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <MfKWek0dQuOOStu0LkxErqMvAlVmOl5L_oNSmPlJSCA=.c3cdb432-ceef-49a0-8475-bbb22b5e5293@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> <MfKWek0dQuOOStu0LkxErqMvAlVmOl5L_oNSmPlJSCA=.c3cdb432-ceef-49a0-8475-bbb22b5e5293@github.com> Message-ID: <ECQsDkIZnHvtcYiplOM4qrEOV0txD4FTSOsGfVfh1VU=.637c3572-cb52-4889-ad83-856c004332cd@github.com> On Fri, 4 Nov 2022 11:04:38 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> src/hotspot/share/oops/stackChunkOop.inline.hpp line 139: >> >>> 137: bool stackChunkOopDesc::is_usable_in_chunk(void* p) const { >>> 138: #if (defined(X86) || defined(AARCH64) || defined(RISCV64)) && !defined(ZERO) >>> 139: HeapWord* start = (HeapWord*)start_address() + sp() - frame::sender_sp_offset RISCV64_ONLY(- frame::metadata_words); >> >> I think this RISCV64 special case will go away with the following change form the PPC64 port: https://github.com/openjdk/jdk/pull/10961/files#diff-563a75c6d20d6f6f35362f5afef7d4f061b9536a37f7c95c27ac37879246d899R138-R139 > > Also `frame::sender_sp_offset` is a platform detail that cannot be used in shared code. Since `metadata_words = sender_sp_offset` on other platforms I'd suggest you replace `frame::sender_sp_offset` with `frame::metadata_words`. Probably this has been forgotten. Yes, that makes sence. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Fri Nov 4 11:48:33 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 11:48:33 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v5] In-Reply-To: <s79CAgujbi8hgIl0ElRkTeO62lC9lr_gIw6HFchPMNg=.863df652-bc47-41a7-863f-81dd623907e2@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <VUwT5X0MCtAVQ0D531lePcYXx8zOI1ckRh63_AzhHRU=.3421aa87-a000-4539-9a9d-b8091df5ac9f@github.com> <-ycncxIQyNpinFjD3jcSBESB-Yp4GGZSdZ0Eqvva7DU=.2871f915-cf60-4364-abc8-9e9d22ed8709@github.com> <763IXbUVc8wSoGTW7KQz0mUSEQQ4B7WrBUGBtdGVEM4=.2d0a35ec-253b-4df4-ba27-5dc910070f05@github.com> <RB39I-u0g0qwxdnNvtkrkiDVZTs1V9nFOI2Uz3UAHGM=.e36a5267-51f6-4bbb-9e42-96f1a2fb102f@github.com> <wxLlwo9jvWkFoA1EvJxMin-AE_F3tVD0w1yYHswHL8U=.20730d5c-c017-4559-8a20-8d132392d59d@github.com> <zGLXygsH37HDIQgLY2PAxX8Vf1bcnC8rQQzM6JvVGRE=.f0587c1f-23be-462a-af45-5183867ebb48@github.com> <s79CAgujbi8hgIl0ElRkTeO62lC9lr_gIw6HFchPMNg=.863df652-bc47-41a7-863f-81dd623907e2@github.com> Message-ID: <8U8hFi4TGGLvndzmFJzvyZnFPY4mOj5ifQG4J4sQLU4=.fdd88e72-c7de-490d-917c-b2c16048321a@github.com> On Fri, 4 Nov 2022 09:37:39 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> @reinrich : Yes, you are right about the frame layout of RISC-V. And your proposed solution looks reasonable to me. Since that will need testing accross all platforms, maybe we can do that as a separate PR later on after this is merged. The RISCV testing platform is relatively slow and this PR has been tested for a long time. Thanks. > > @RealFYang I'm ok with leaving this for later. The change is actually rather small. I've added it to the PPC64 port with https://github.com/openjdk/jdk/pull/10961/commits/0d12b0577d8314ed1d571d9cac65fa8866939180 > Quick tests succeeded on X86, AARCH64 and PPC64. @reinrich : That's great to hear. Thank you. After this PR is merged, I can help test your changes on RISC-V when you rebase. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Fri Nov 4 11:48:38 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Nov 2022 11:48:38 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> <9AFqhT7zp6KcpSgpo4jkMVNm9AQh0yzgnFxGKSR7YHY=.8e33abbb-549a-4e00-8aa2-253c91f77e40@github.com> Message-ID: <78GR-GtaxX3KtFAxIEXzO5mdo5gRerjup0Oh2hTfXpU=.f1b83535-ad92-44b1-b3b0-b7f8114a0b37@github.com> On Fri, 4 Nov 2022 10:07:30 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Code cleanup >> - Merge branch 'master' into 8286301 >> - Merge branch 'master' into 8286301 >> - Fix >> - 8286301: JEP 425 to RISC-V > > src/hotspot/share/runtime/sharedRuntime.cpp line 3123: > >> 3121: >> 3122: if (method->is_continuation_enter_intrinsic()) { >> 3123: buffer.initialize_stubs_size(NOT_RISCV64(128) RISCV64_ONLY(192)); > > Can we use the maximum over all platforms? This is a temporary buffer anyway. The generated code, constants, etc. is copied to the nmethod. See https://github.com/openjdk/jdk/blob/8ee0f7d5982d95674cfc1b217dbabaeafefbc8f1/src/hotspot/share/code/nmethod.cpp#L659-L660 OK, I have set the default size to 192 for all platforms. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From matsaave at openjdk.org Fri Nov 4 13:32:02 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 4 Nov 2022 13:32:02 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v10] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <IUV7KRpBkMYCR-v-HVfoluz_S75wM87fVwYh5Fbgjsk=.25e60726-facf-4d78-8e2c-c73ec6566815@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/7cfe1f5d..d716d2e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=08-09 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From lkorinth at openjdk.org Fri Nov 4 13:48:26 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 4 Nov 2022 13:48:26 GMT Subject: RFR: 8296401: ConcurrentHashTable::bulk_delete might miss to delete some objects Message-ID: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> ConcurrentHashTable::bulk_delete might miss to delete some objects if a bucket has more than 256 entries. Current uses of ConcurrentHashTable are not harmed by this behaviour. I modified gtest:ConcurrentHashTable to detect the problem (first commit), and fixed the problem in the code (second commit). Tests passes tier1-3. ------------- Commit messages: - working! - test Changes: https://git.openjdk.org/jdk/pull/10983/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10983&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296401 Stats: 101 lines in 2 files changed: 47 ins; 19 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/10983.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10983/head:pull/10983 PR: https://git.openjdk.org/jdk/pull/10983 From rrich at openjdk.org Fri Nov 4 14:40:42 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Nov 2022 14:40:42 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v5] In-Reply-To: <MVDmwCG_U2khyoo3NdlYMjVvCzd5CKaVp99L-fKjyKM=.3e4976f1-e24b-4730-9565-45fa635770a9@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <MVDmwCG_U2khyoo3NdlYMjVvCzd5CKaVp99L-fKjyKM=.3e4976f1-e24b-4730-9565-45fa635770a9@github.com> Message-ID: <r42XZNXLR_DX6D3zmEz_uvW1dLHJsd4S_UP0enVxBxc=.6d87bbae-66e4-4dfc-8ba8-80ff1c25a177@github.com> On Fri, 4 Nov 2022 11:42:28 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Hi, >> >> Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. >> >> This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. >> Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't >> affect the rest of the world in theory. >> >> There exists some differences in frame structure between AArch64 and RISC-V. >> For AArch64, we have: >> >> enum { >> link_offset = 0, >> return_addr_offset = 1, >> sender_sp_offset = 2 >> }; >> >> While for RISC-V, we have: >> >> enum { >> link_offset = -2, >> return_addr_offset = -1, >> sender_sp_offset = 0 >> }; >> >> So we need adapations in some places where the code relies on value of sender_sp_offset to work. >> Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to >> evaluate more on its impact on performance. >> >> Testing on Linux-riscv64 HiFive Unmatched board: >> - Minimal, Client and Server release & fastdebug build OK. >> - Passed tier1-tier4 tests (release build). >> - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). >> - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review Looks good, thanks! Richard. src/hotspot/share/oops/stackChunkOop.inline.hpp line 139: > 137: bool stackChunkOopDesc::is_usable_in_chunk(void* p) const { > 138: #if (defined(X86) || defined(AARCH64) || defined(RISCV64)) && !defined(ZERO) > 139: HeapWord* start = (HeapWord*)start_address() + sp() - frame::metadata_words; This looks platform independent now, doesn't it? I think the cpp conditional can be removed. You can leave it also to the PPC64 port as I'll touch that line again. As you like. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/10917 From jnimeh at openjdk.org Fri Nov 4 14:40:45 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Fri, 4 Nov 2022 14:40:45 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> Message-ID: <GuAp66LcJ1C9QrjIKi-UfLCbz-YS9subHRLGcjXrlUM=.d1599b0d-78c1-4da8-bce3-2705602f92ab@github.com> On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - address Jamil's review > - invalidkeyexception and some review comments > - extra whitespace character > - assembler checks and test case fixes > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Merge remote-tracking branch 'origin' into avx512-poly > - further restrict UsePolyIntrinsics with supports_avx512vlbw > - missed white-space fix > - - Fix whitespace and copyright statements > - Add benchmark > - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c Regarding the updated numbers and master v. optimized-and-disabled, those are looking pretty good. Looks like the break-even point is at 64 bytes and gets better from there which I think addresses my concerns. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 4 14:40:45 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 14:40:45 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <mymD7nKP6xLz1XoansrlVbhxo6EK0Zefc5OJ_WFyf3g=.33270ab8-22af-4cc3-b0ca-eb364624925a@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <BKHPNZFMR60W5dFzqO3sps0PKpxX2ggEjnK6yqKT_RE=.e531ff46-43cb-4ed3-b3f9-103ab0c7a8e5@github.com> <Cju0XRYyqdeVeZZEpttxF9eUX02aQ0UM3CfD5uIe0OI=.6057ecb9-c80f-4eef-8aae-9f8748bc8f6c@github.com> <4HxTb1DtD6KeuYupOKf32GoQ7SV8_EjHcqfhiZhbLHM=.884e631a-1336-454d-aae1-06f85f784381@github.com> <mymD7nKP6xLz1XoansrlVbhxo6EK0Zefc5OJ_WFyf3g=.33270ab8-22af-4cc3-b0ca-eb364624925a@github.com> Message-ID: <2_5CBS8aY7mUfXAvavQiM66Xg7ZNmjDiM6YM1vADCM4=.c0b0cc5e-9257-4d65-82cc-1cfd5523b554@github.com> On Wed, 2 Nov 2022 03:16:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >>> And just looking now on uops.info, they seem to have identical timings? >> >> Actual instruction being used (aligned vs unaligned versions) doesn't matter much here, because it's a dynamic property of the address being accessed: misaligned accesses that cross cache line boundary incur a penalty. Since cache lines are 64 bytes in size, every misaligned 512-bit access is penalized. > > I collected performance counters for the benchmark included with the patch and its showing around 30% of 64 byte loads were spanning across the cache line. > > Performance counter stats for 'java -jar target/benchmarks.jar -f 1 -wi 1 -i 2 -w 30 -p dataSize=8192': > > 122385646614 cycles > 328096538160 instructions # 2.68 insn per cycle > 64530343063 MEM_INST_RETIRED.ALL_LOADS > 22900705491 MEM_INST_RETIRED.ALL_STORES > 19815558484 MEM_INST_RETIRED.SPLIT_LOADS > 701176106 MEM_INST_RETIRED.SPLIT_STORES > > Presence of scalar peel loop before the vector loop can save this penalty but given its operating over block streams it may be tricky. > We should also extend the scope of optimization (preferably in this PR or in subsequent one) to optimize [MAC computation routine accepting ByteBuffer.](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java#L116), To close this thread.. @jatin-bhateja and I talked and realized that it is not possible to re-align input here. At least not with peeling with scalar loop. Scalar loop peels full blocks only (i.e. 16 bytes at a time). So out of 64 positions, 1 is already aligned, 3 could be aligned with the right peel, and 60 will land badly regardless. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 4 14:40:45 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 14:40:45 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <4AB7TAZwydDonBwfxasMLmgVIQuaLgMUxck7eCbzYxw=.a9062602-90d4-4bde-baff-629bea466527@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <rWyVuKzBYxwlH2VD7NTEhn3RmEkPD7Y1xZxLdzRC9PU=.afb331b2-0940-46ff-8801-2debf135cdc9@github.com> <KwT1ejhy7yvVIpMPCpkT2E87-bwwXRvVhZ-fy_Bm8po=.d71506cf-3e0c-4eb8-a008-390ab2ff4e02@github.com> <cjsKh51GqdfZ4GnIcniUrJfX7McnzDOz_cSf3BSR4FA=.4181923c-e89d-4e1d-902a-0b390b515472@github.com> <SajtNyPTRfsIaMh_o-fFN0Q1HYmehmUl-V5QCWm5jp4=.56a900ad-c725-4577-89d1-8ea860776d88@github.com> <SHVra8SjH-iWXZ-s-f4cWtbylOAg8rlGNQc3cVxTFRI=.c2d808b3-e5cb-4131-8c0b-e1c873aa5b3c@github.com> <ncjUr-Adsj7uaO29FWJ6k0umBrEXfv7fDg0x8cnE6n0=.4dee4a46-c702-44b3-a5bf-187263e9ac2d@github.com> <4AB7TAZwydDonBwfxasMLmgVIQuaLgMUxck7eCbzYxw=.a9062602-90d4-4bde-baff-629bea466527@github.com> Message-ID: <tZyP57PymZoHKFEd2x4U9muSBSp6taAXyu-GQdhwkzw=.2c78e05a-eacd-4315-b1a3-d96db81b58d7@github.com> On Fri, 28 Oct 2022 20:58:33 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> No, going the WhiteBox route was not something I was thinking of. I sought feedback from a couple hotspot-knowledgable people about the use of WhiteBox APIs and both felt that it was not the right way to go. One said that WhiteBox is really for VM testing and not for these kinds of java classes. > > One idea I was trying to measure was to make the intrinsic (i.e. the while loop remains exactly the same, just moved to different =non-static= function): > > private void processMultipleBlocks(byte[] input, int offset, int length) { //, MutableIntegerModuloP A, IntegerModuloP R) { > while (length >= BLOCK_LENGTH) { > n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); > a.setSum(n); // A += (temp | 0x01) > a.setProduct(r); // A = (A * R) % p > offset += BLOCK_LENGTH; > length -= BLOCK_LENGTH; > } > } > > > In principle, the java version would not get any slower (i.e. there is only one extra function jump). At the expense of the C++ glue getting more complex. In C++ I need to dig out using IR `(sun.security.util.math.intpoly.IntegerPolynomial.MutableElement)(this.a).limbs` then convert 5*26bit limbs into 3*44-bit limbs. The IR is very new to me so will take some time. (I think I found some AES code that does something similar). > > That said.. I thought this idea would had been perhaps a separate PR, if needed at all.. Digging limbs out is one thing, but also need to add asserts and safety. Mostly would be happy to just measure if its worth it. thread resumed below ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 4 14:40:46 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 14:40:46 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <523ASDMlZe7mAZaBQe3ipxBLaLum7_XZqLLUUgsCJi0=.db28f521-c957-4fb2-8dcc-7c09d46189e3@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <523ASDMlZe7mAZaBQe3ipxBLaLum7_XZqLLUUgsCJi0=.db28f521-c957-4fb2-8dcc-7c09d46189e3@github.com> Message-ID: <jdvVk9YyVGlfhzbzbYo9A1IbGKQdWjFSx6-PKDh8RvA=.275f6e48-b15d-4aaa-8da7-73b3f752c3c3@github.com> On Tue, 18 Oct 2022 22:51:51 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - address Jamil's review >> - invalidkeyexception and some review comments >> - extra whitespace character >> - assembler checks and test case fixes >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin' into avx512-poly >> - further restrict UsePolyIntrinsics with supports_avx512vlbw >> - missed white-space fix >> - - Fix whitespace and copyright statements >> - Add benchmark >> - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 286: > >> 284: * numeric values. >> 285: */ >> 286: private void setRSVals() { //throws InvalidKeyException { > > The R and S check for invalid key (all bytes zero) could be submitted as a separate PR. > It is not related to the Poly1305 acceleration. done, added a flag ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jnimeh at openjdk.org Fri Nov 4 16:32:16 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Fri, 4 Nov 2022 16:32:16 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> Message-ID: <f0Bv-td_2zmCQ7F8RjVS-O5SK36DolXJt83FD34zsuY=.d5336884-e8ae-4f01-bf23-47f6fec785c7@github.com> On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - address Jamil's review > - invalidkeyexception and some review comments > - extra whitespace character > - assembler checks and test case fixes > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Merge remote-tracking branch 'origin' into avx512-poly > - further restrict UsePolyIntrinsics with supports_avx512vlbw > - missed white-space fix > - - Fix whitespace and copyright statements > - Add benchmark > - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c src/hotspot/share/opto/library_call.cpp line 7036: > 7034: assert(r_start, "r array is NULL"); > 7035: > 7036: Node* call = make_runtime_call(RC_LEAF, Can we safely change this to `RC_LEAF | RC_NO_FP`? For the ChaCha20 block intrinsic I'm working on I've been using that parameter because I'm not touching the FP registers and that looks to be the case here (though your intrinsic is a lot more complicated than mine so I may have missed something). I believe the GHASH and AES library call routines also call `make_runtime_call()` in this way. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From tschatzl at openjdk.org Fri Nov 4 17:29:27 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Nov 2022 17:29:27 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs Message-ID: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Hi all, can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. Some comments: * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. Testing: tier1-5 Thanks, Thomas ------------- Commit messages: - Missing include - Fix indentation - initial implementation Changes: https://git.openjdk.org/jdk/pull/10989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10989&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295871 Stats: 20 lines in 5 files changed: 7 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10989.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10989/head:pull/10989 PR: https://git.openjdk.org/jdk/pull/10989 From ascarpino at openjdk.org Fri Nov 4 17:29:47 2022 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Fri, 4 Nov 2022 17:29:47 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> Message-ID: <_KivimSjPXP-a8M1gaVOxawfozZ8K4mOkvWwb1w00J8=.70c924a0-8d7e-4673-977c-01d1044ae73c@github.com> On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - address Jamil's review > - invalidkeyexception and some review comments > - extra whitespace character > - assembler checks and test case fixes > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Merge remote-tracking branch 'origin' into avx512-poly > - further restrict UsePolyIntrinsics with supports_avx512vlbw > - missed white-space fix > - - Fix whitespace and copyright statements > - Add benchmark > - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c Thanks for moving the conversion of R and A from the java code into the intrinsic. That certainly reduced the footprint on the java code with regard to performance and code flow. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 4 17:29:48 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 17:29:48 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <f0Bv-td_2zmCQ7F8RjVS-O5SK36DolXJt83FD34zsuY=.d5336884-e8ae-4f01-bf23-47f6fec785c7@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> <f0Bv-td_2zmCQ7F8RjVS-O5SK36DolXJt83FD34zsuY=.d5336884-e8ae-4f01-bf23-47f6fec785c7@github.com> Message-ID: <yakJWzO8mmn5VJ0WpAEAwo1DHqpq2xkkD-oQol0gEhA=.dc776ad9-a666-4928-b59c-087af1052508@github.com> On Fri, 4 Nov 2022 16:28:51 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - address Jamil's review >> - invalidkeyexception and some review comments >> - extra whitespace character >> - assembler checks and test case fixes >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin' into avx512-poly >> - further restrict UsePolyIntrinsics with supports_avx512vlbw >> - missed white-space fix >> - - Fix whitespace and copyright statements >> - Add benchmark >> - ... and 2 more: https://git.openjdk.org/jdk/compare/9d3b4ef2...38d9e83c > > src/hotspot/share/opto/library_call.cpp line 7036: > >> 7034: assert(r_start, "r array is NULL"); >> 7035: >> 7036: Node* call = make_runtime_call(RC_LEAF, > > Can we safely change this to `RC_LEAF | RC_NO_FP`? For the ChaCha20 block intrinsic I'm working on I've been using that parameter because I'm not touching the FP registers and that looks to be the case here (though your intrinsic is a lot more complicated than mine so I may have missed something). I believe the GHASH and AES library call routines also call `make_runtime_call()` in this way. Makes sense to me, will put it in and re-test (no fp registers anywhere in the intrinsic). Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From mcimadamore at openjdk.org Fri Nov 4 18:23:17 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 4 Nov 2022 18:23:17 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'master' into PR_20 - Merge branch 'master' into PR_20 - Merge pull request #14 from minborg/small-javadoc Update some javadocs - Update some javadocs - Revert some javadoc changes - Merge branch 'master' into PR_20 - Fix benchmark and test failure - Merge pull request #13 from minborg/revert-factories Revert MemorySegment factories - Update javadocs after comments - Revert MemorySegment factories - ... and 7 more: https://git.openjdk.org/jdk/compare/7eb59e41...3d933028 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/ac7733da..3d933028 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=00-01 Stats: 51371 lines in 672 files changed: 16181 ins; 32391 del; 2799 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From duke at openjdk.org Fri Nov 4 18:26:35 2022 From: duke at openjdk.org (duke) Date: Fri, 4 Nov 2022 18:26:35 GMT Subject: Withdrawn: 8291237: Encapsulate nmethod Deoptimization logic In-Reply-To: <y89Onoa5PBtF_mjunbnCbJ64S2QVa4yL_OXWlJwjHwE=.d7482bbb-5c4c-4105-84f6-80db333ec50c@github.com> References: <y89Onoa5PBtF_mjunbnCbJ64S2QVa4yL_OXWlJwjHwE=.d7482bbb-5c4c-4105-84f6-80db333ec50c@github.com> Message-ID: <T52MSNZ18HWgWeEHWL4_Vw3DVTRoHHv9H7oogesliMw=.da6009e7-b35a-4656-a2f1-baedfb2bafcb@github.com> On Wed, 27 Jul 2022 12:55:04 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > The proposal is to encapsulate the nmethod mark for deoptimization logic in one place and only allow access to the `mark_for_deoptimization` from a closure object: > ```C++ > class DeoptimizationMarkerClosure : StackObj { > public: > virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0; > }; > > This closure takes a `MarkFn` which it uses to mark which nmethods should be deoptimized. This marking can only be done through the `MarkFn` and a `MarkFn` can only be created in the following code which runs the closure. > ```C++ > { > NoSafepointVerifier nsv; > assert_locked_or_safepoint(Compile_lock); > marker_closure.marker_do(MarkFn()); > anything_deoptimized = deoptimize_all_marked(); > } > if (anything_deoptimized) { > run_deoptimize_closure(); > } > > This ensures that this logic is encapsulated and the `NoSafepointVerifier` and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` not having to scan the whole code cache sound. > > The exception to this pattern, from `InstanceKlass::unload_class`, is discussed in the JBS issue, and gives reasons why not marking for deoptimization there is ok. > > An effect of this encapsulation is that the deoptimization logic was moved from the `CodeCache` class to the `Deoptimization` class and the class redefinition logic was moved from the `CodeCache` class to the `VM_RedefineClasses` class/operation. > > Testing: Tier 1-5 > > _Update_ > --- > Switched too using a RAII object to track the context instead of putting code in a closure. But all the encapsulation is still the same. > > Testing: Tier 1-7 > > _Update_ > --- >> @stefank suggested splitting out unloading klass logic change into a separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). >> >> Will probably also limit this PR to only encapsulation. (Skipping the linked list optimisation) And create a separate issue for that as well. >> >> But this creates a chain of three dependent issues. [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link list optimisation depend will depend on [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237). >> >> Will mark this as a draft for now and create a PR for [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first. > > _Update_ > --- > Testing after 11d9dd2: Oracle platforms tier 1-5 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9655 From matsaave at openjdk.org Fri Nov 4 19:07:29 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 4 Nov 2022 19:07:29 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v11] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <LSPP2f1ESqAS83wDjpnScW9ZL-IBFoAGaUy6XN960cU=.e32e5d6c-5b3e-4834-9c83-3bd7c8a005be@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into invokeDynamicPrinter - Fixed copyright - Fixed gtest - Fixed code formatting - Improved formatting - Added gtest - changed NULL to nullptr - Added null check and resource mark - fixed last trailing whitespace - Removed trailing whitespace - ... and 2 more: https://git.openjdk.org/jdk/compare/ba76c801...5e324d55 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/d716d2e3..5e324d55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=09-10 Stats: 75177 lines in 949 files changed: 16590 ins; 55737 del; 2850 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From duke at openjdk.org Fri Nov 4 21:01:40 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Nov 2022 21:01:40 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <Kc1IpZcXV95qbdzVNsLaTKt4VRdafGSxXd1sqVSpsPc=.ba9f3752-817b-4cc1-82b8-edb09c6a0ce8@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <Kc1IpZcXV95qbdzVNsLaTKt4VRdafGSxXd1sqVSpsPc=.ba9f3752-817b-4cc1-82b8-edb09c6a0ce8@github.com> Message-ID: <p7zf1K8ddqMuzyjT13mfYVS28z-VWZsrsGvOF2F3upM=.359cfea5-4dde-46bc-95d3-1ba6728e65ea@github.com> On Tue, 25 Oct 2022 00:31:07 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > >> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead >> 174: // and not affect platforms without intrinsic support >> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; > > The ByteBuffer version can also benefit from this optimization if it has array as backing storage. I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about) For future reference, patched but untested: void engineUpdate(ByteBuffer buf) { int remaining = buf.remaining(); while (remaining > 0) { int bytesToWrite = Integer.min(remaining, BLOCK_LENGTH - blockOffset); if (bytesToWrite >= BLOCK_LENGTH) { // Have at least one full block in the buf, process all full blocks int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1)); processMultipleBlocks(buf, blockMultipleLength); remaining -= blockMultipleLength; } else { // We have some left-over data from previous updates, so // copy that into the holding block until we get a full block. buf.get(block, blockOffset, bytesToWrite); blockOffset += bytesToWrite; if (blockOffset >= BLOCK_LENGTH) { processBlock(block, 0, BLOCK_LENGTH); blockOffset = 0; } remaining -= bytesToWrite; } } } private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) { if (buf.hasArray()) { byte[] input = buf.array(); int offset = buf.arrayOffset(); Objects.checkFromIndexSize(offset, blockMultipleLength, input.length); a.checkLimbsForIntrinsic(); r.checkLimbsForIntrinsic(); processMultipleBlocks(input, offset, blockMultipleLength); return; } while (blockMultipleLength > 0) { processBlock(buf, BLOCK_LENGTH); blockMultipleLength -= BLOCK_LENGTH; } } But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer: public final boolean hasArray() { return (hb != null) && !isReadOnly; } end hint) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sviswanathan at openjdk.org Fri Nov 4 21:05:30 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 4 Nov 2022 21:05:30 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <p7zf1K8ddqMuzyjT13mfYVS28z-VWZsrsGvOF2F3upM=.359cfea5-4dde-46bc-95d3-1ba6728e65ea@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <Kc1IpZcXV95qbdzVNsLaTKt4VRdafGSxXd1sqVSpsPc=.ba9f3752-817b-4cc1-82b8-edb09c6a0ce8@github.com> <p7zf1K8ddqMuzyjT13mfYVS28z-VWZsrsGvOF2F3upM=.359cfea5-4dde-46bc-95d3-1ba6728e65ea@github.com> Message-ID: <Eqsi1O08UMG0Oq-eUNGck6Y1ALmsoKHZ9k-cDS_C-DY=.f9519ec0-7ecd-4f0d-9137-f1656b9dcc4b@github.com> On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: >> >>> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead >>> 174: // and not affect platforms without intrinsic support >>> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; >> >> The ByteBuffer version can also benefit from this optimization if it has array as backing storage. > > I spent some time looking at `engineUpdate(ByteBuffer buf)`. I think it makes sense to make it into a separate PR. I think I figured out the code, but its rather 'finicky'. The existing function is already rather clever; there are quite a few cases to get correct (`engineUpdate(byte[] input, int offset, int len)` unrolled the decision tree, so its easier to reason about) > > For future reference, patched but untested: > > > void engineUpdate(ByteBuffer buf) { > int remaining = buf.remaining(); > while (remaining > 0) { > int bytesToWrite = Integer.min(remaining, > BLOCK_LENGTH - blockOffset); > > if (bytesToWrite >= BLOCK_LENGTH) { > // Have at least one full block in the buf, process all full blocks > int blockMultipleLength = buf.remaining() & (~(BLOCK_LENGTH-1)); > processMultipleBlocks(buf, blockMultipleLength); > remaining -= blockMultipleLength; > } else { > // We have some left-over data from previous updates, so > // copy that into the holding block until we get a full block. > buf.get(block, blockOffset, bytesToWrite); > blockOffset += bytesToWrite; > > if (blockOffset >= BLOCK_LENGTH) { > processBlock(block, 0, BLOCK_LENGTH); > blockOffset = 0; > } > remaining -= bytesToWrite; > } > } > } > > private void processMultipleBlocks(ByteBuffer buf, int blockMultipleLength) { > if (buf.hasArray()) { > byte[] input = buf.array(); > int offset = buf.arrayOffset(); > > Objects.checkFromIndexSize(offset, blockMultipleLength, input.length); > a.checkLimbsForIntrinsic(); > r.checkLimbsForIntrinsic(); > processMultipleBlocks(input, offset, blockMultipleLength); > return; > } > > while (blockMultipleLength > 0) { > processBlock(buf, BLOCK_LENGTH); > blockMultipleLength -= BLOCK_LENGTH; > } > } > > > But it might make more sense to emulate `engineUpdate(byte[] input, int offset, int len)` and unroll the loop. (Hint: to test for Buffer without array, create read-only buffer: > > public final boolean hasArray() { > return (hb != null) && !isReadOnly; > } > > end hint) Sounds good, let us do the ByteBuffer support as a follow on PR. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 4 23:01:37 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 4 Nov 2022 23:01:37 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <w3_d62ZmgHAQs82ehx7Ie4HrEBjgA-ntmDLyyno8Ies=.3f5e5ba5-8bdc-4676-a555-404d42a32072@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> <HN_ppGL2xcGGzXWAs7-lw9HxLJ063u7gtfIYKB2Rg0I=.78843e37-397b-4e1f-867f-bbe974d4d7cf@github.com> <w3_d62ZmgHAQs82ehx7Ie4HrEBjgA-ntmDLyyno8Ies=.3f5e5ba5-8bdc-4676-a555-404d42a32072@github.com> Message-ID: <YU5iMQHQssoyxB_6s0BoMkyT8Hf4nH6H7HEAQd_jZ00=.620bf27b-d318-4cd6-b13b-4994ff1f6091@github.com> On Fri, 4 Nov 2022 10:30:22 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Interesting! I do hit the assert during JDK build: >> >> # Internal Error (.../src/hotspot/share/oops/instanceKlass.cpp:390), pid=956, tid=6147 >> # Error: assert(this_key != __null) failed >> >> V report_vm_error(char const*, int, char const*, char const*, ...)+0x88 >> V InstanceKlass::set_nest_host(InstanceKlass*)+0x254 >> V SystemDictionary::load_shared_lambda_proxy_class(InstanceKlass*, Handle, Handle, PackageEntry*, JavaThread*)+0x19c >> V SystemDictionaryShared::prepare_shared_lambda_proxy_class(InstanceKlass*, InstanceKlass*, JavaThread*)+0x13c >> V JVM_LookupLambdaProxyClassFromArchive+0x2cc >> C Java_java_lang_invoke_LambdaProxyClassArchive_findFromArchive+0x4c >> j java.lang.invoke.LambdaProxyClassArchive.findFromArchive(...) java.base at 20-internal >> ... >> >> >> Looks like a pre-existing bug to me. > > OK! I'll do a bit more digging. FTR Calvin filed [JDK-8296433](https://bugs.openjdk.org/browse/JDK-8296433) to track the issue. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From dholmes at openjdk.org Fri Nov 4 23:28:35 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Nov 2022 23:28:35 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v11] In-Reply-To: <LSPP2f1ESqAS83wDjpnScW9ZL-IBFoAGaUy6XN960cU=.e32e5d6c-5b3e-4834-9c83-3bd7c8a005be@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <LSPP2f1ESqAS83wDjpnScW9ZL-IBFoAGaUy6XN960cU=.e32e5d6c-5b3e-4834-9c83-3bd7c8a005be@github.com> Message-ID: <zhxrjWFTGPz4ebnfZEl5RYaSDZbxNZD7pCd__fOVBCY=.cedaa8eb-de0c-4697-b72e-109aa321e310@github.com> On Fri, 4 Nov 2022 19:07:29 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 <Concat0.s/Ljava/lang/String;> >> 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> >> BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 <Concat0.d/Ljava/lang/String;> >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into invokeDynamicPrinter > - Fixed copyright > - Fixed gtest > - Fixed code formatting > - Improved formatting > - Added gtest > - changed NULL to nullptr > - Added null check and resource mark > - fixed last trailing whitespace > - Removed trailing whitespace > - ... and 2 more: https://git.openjdk.org/jdk/compare/bb3b1e39...5e324d55 Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10860 From fyang at openjdk.org Sat Nov 5 02:14:29 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 5 Nov 2022 02:14:29 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v5] In-Reply-To: <r42XZNXLR_DX6D3zmEz_uvW1dLHJsd4S_UP0enVxBxc=.6d87bbae-66e4-4dfc-8ba8-80ff1c25a177@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <MVDmwCG_U2khyoo3NdlYMjVvCzd5CKaVp99L-fKjyKM=.3e4976f1-e24b-4730-9565-45fa635770a9@github.com> <r42XZNXLR_DX6D3zmEz_uvW1dLHJsd4S_UP0enVxBxc=.6d87bbae-66e4-4dfc-8ba8-80ff1c25a177@github.com> Message-ID: <PD0t1oSOIbOlyeMlvj1fxKyOCBd5SLTAqHYvaIa_ZsI=.47708afc-7b75-4cf0-b8e3-dbbf860996f4@github.com> On Fri, 4 Nov 2022 14:19:08 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > src/hotspot/share/oops/stackChunkOop.inline.hpp line 139: > >> 137: bool stackChunkOopDesc::is_usable_in_chunk(void* p) const { >> 138: #if (defined(X86) || defined(AARCH64) || defined(RISCV64)) && !defined(ZERO) >> 139: HeapWord* start = (HeapWord*)start_address() + sp() - frame::metadata_words; > > This looks platform independent now, doesn't it? I think the cpp conditional can be removed. You can leave it also to the PPC64 port as I'll touch that line again. As you like. I think I will leave it to the PR for PPC64. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Sat Nov 5 02:20:29 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 5 Nov 2022 02:20:29 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V [v4] In-Reply-To: <WuyZNKkc9GrO5QkM5Hq4Wi4nwjEBFlC8Am5Un2FaAqE=.0921f561-76a7-4504-bcbf-1b86380747c2@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> <a6MAXdT5pkVxtdVKYR4VaKhWNWg3l5lw3KBwH7Wk9MI=.a4981486-26f7-40a9-bfb2-03534e053aca@github.com> <WuyZNKkc9GrO5QkM5Hq4Wi4nwjEBFlC8Am5Un2FaAqE=.0921f561-76a7-4504-bcbf-1b86380747c2@github.com> Message-ID: <jzJtF6IHpB51ag_fd7t2Vpwh4_50-VC2LYbZqb0lSjQ=.4a425886-7f82-43dd-95d6-e73eae78cd49@github.com> On Fri, 4 Nov 2022 08:32:40 GMT, Jie Fu <jiefu at openjdk.org> wrote: > The shared code change looks good to me. > > I'm not sure if it's possible to eliminate all the platform-dependent code. But the current version looks good enough to me. Thanks. Thanks for looking at this. I think we have enough reviews for both the shared code and riscv-specific changes now. Let's proceed so that this won't block reviewing of PR for the loom ppc64 port. ------------- PR: https://git.openjdk.org/jdk/pull/10917 From fyang at openjdk.org Sat Nov 5 02:22:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 5 Nov 2022 02:22:03 GMT Subject: Integrated: 8286301: Port JEP 425 to RISC-V In-Reply-To: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> References: <MI5D-B9nlsxmn0Ry_kk_BetXwNrW6eCVWqNMd8zGxiM=.c7b50588-ec70-4ab4-aaaa-58a9674f6ab7@github.com> Message-ID: <U8dO2vUfZ3obf5F_6jkVEoLStVx6IjU9GklEbWf0-74=.c55a516d-f015-4d89-a35e-f456fea60e07@github.com> On Mon, 31 Oct 2022 12:41:28 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, > > Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. > > This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. > Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't > affect the rest of the world in theory. > > There exists some differences in frame structure between AArch64 and RISC-V. > For AArch64, we have: > > enum { > link_offset = 0, > return_addr_offset = 1, > sender_sp_offset = 2 > }; > > While for RISC-V, we have: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > So we need adapations in some places where the code relies on value of sender_sp_offset to work. > Note that implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to > evaluate more on its impact on performance. > > Testing on Linux-riscv64 HiFive Unmatched board: > - Minimal, Client and Server release & fastdebug build OK. > - Passed tier1-tier4 tests (release build). > - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). > - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). This pull request has now been integrated. Changeset: 91292d56 Author: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/91292d56a9c2b8010466d105520e6e898ae53679 Stats: 1278 lines in 30 files changed: 1021 ins; 84 del; 173 mod 8286301: Port JEP 425 to RISC-V Co-authored-by: Xiaolin Zheng <xlinzheng at openjdk.org> Reviewed-by: fjiang, xlinzheng, yadongwang, jiefu, rrich ------------- PR: https://git.openjdk.org/jdk/pull/10917 From jvernee at openjdk.org Sat Nov 5 19:48:32 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 5 Nov 2022 19:48:32 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> Message-ID: <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> On Fri, 4 Nov 2022 18:23:17 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Merge branch 'master' into PR_20 > - Merge pull request #14 from minborg/small-javadoc > > Update some javadocs > - Update some javadocs > - Revert some javadoc changes > - Merge branch 'master' into PR_20 > - Fix benchmark and test failure > - Merge pull request #13 from minborg/revert-factories > > Revert MemorySegment factories > - Update javadocs after comments > - Revert MemorySegment factories > - ... and 7 more: https://git.openjdk.org/jdk/compare/e1e4e45b...3d933028 Some preliminary comments about some changes I think are missing from this PR (noticed while I was making a patch for the VM changes) I will do a more thorough review after the changes from https://github.com/openjdk/panama-foreign/pull/750 are included as well. src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 474: > 472: long bbAddress = NIO_ACCESS.getBufferAddress(bb); > 473: Object base = NIO_ACCESS.getBufferBase(bb); > 474: UnmapperProxy unmapper = NIO_ACCESS.unmapper(bb); Looks like here is also missing the fix that rejects StringCharBuffer: https://github.com/openjdk/panama-foreign/pull/741 I think that is good to include as well. src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java line 477: > 475: case UNBOX_ADDRESS -> emitUnboxAddress(); > 476: case DUP -> emitDupBinding(); > 477: case CAST -> emitCast((Binding.Cast) binding); This contains the CAST binding, but not the accompanying VM changes from: https://github.com/openjdk/panama-foreign/pull/720 which removes the now dead code. Preferably both changes go together (and the code removal is pretty trivial, so I suggest including it here) src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java line 491: > 489: emitLoad(highLevelType, paramIndex2ParamSlot[paramIndex]); > 490: > 491: if (shouldAcquire(paramIndex)) { I can't comment on the actual line below, but this is also missing the fix from: https://github.com/openjdk/panama-foreign/pull/739 (that is a Java-only change as well). I suggest adding that as well. src/java.base/share/classes/jdk/internal/foreign/abi/x64/windows/CallArranger.java line 165: > 163: assert forArguments : "no stack returns"; > 164: // stack > 165: long alignment = Math.max(layout.byteAlignment(), STACK_SLOT_SIZE); This is also missing part of the changes from: https://github.com/openjdk/panama-foreign/pull/728/ but other changes to the shared code are present. The `layout` parameter is not needed here. (see the changes to this file in the original PR) ------------- PR: https://git.openjdk.org/jdk/pull/10872 From fjiang at openjdk.org Sun Nov 6 01:51:47 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 6 Nov 2022 01:51:47 GMT Subject: RFR: 8296435: RISC-V: Small refactoring for increment/decrement Message-ID: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> The `increment` and `decrement` use t1 as tmp register, while t1 was the flag register in c2. We can make tmp registers as the arguments of `increment` and `decrement` so that c2 can reuse them. ------------- Commit messages: - fix - fix build - add tmp register for increment/decrement Changes: https://git.openjdk.org/jdk/pull/11005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296435 Stats: 42 lines in 3 files changed: 0 ins; 6 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/11005.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11005/head:pull/11005 PR: https://git.openjdk.org/jdk/pull/11005 From dholmes at openjdk.org Mon Nov 7 00:31:14 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Nov 2022 00:31:14 GMT Subject: RFR: 8296401: ConcurrentHashTable::bulk_delete might miss to delete some objects In-Reply-To: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> References: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> Message-ID: <hkEFuiPj5ZuZqUeMnjSmTiOol5D5olw0vbjMESJBHyc=.f039e8a1-8e8b-4b68-9243-e66f4cd09b83@github.com> On Fri, 4 Nov 2022 13:38:23 GMT, Leo Korinth <lkorinth at openjdk.org> wrote: > ConcurrentHashTable::bulk_delete might miss to delete some objects if a bucket has more than 256 entries. Current uses of ConcurrentHashTable are not harmed by this behaviour. > > I modified gtest:ConcurrentHashTable to detect the problem (first commit), and fixed the problem in the code (second commit). > > Tests passes tier1-3. Hi Leo, Can you explain what the actual bug is here and/or in the JBS issue. AFAICS your fix is just to add a nested `for(;;)` loop but it is not clear what that actually achieves. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10983 From fyang at openjdk.org Mon Nov 7 01:49:28 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 7 Nov 2022 01:49:28 GMT Subject: RFR: 8296435: RISC-V: Small refactoring for increment/decrement In-Reply-To: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> References: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> Message-ID: <nyVf2hIFJ2QnB0XiU2ZhbAuvoG_hF2AKdB98Ze2Ehrc=.2fd1aa14-f007-4dc0-b9ce-7c091c84bf34@github.com> On Sun, 6 Nov 2022 01:42:50 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > The `increment` and `decrement` use t1 as tmp register, while t1 was the flag register in c2. > We can make tmp registers as the arguments of `increment` and `decrement` so that c2 can reuse them. > > Testing: > > full tier1 tests passed on Linux-riscv64 HiFive Unmathced board with release build Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11005 From dzhang at openjdk.org Mon Nov 7 02:07:04 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 7 Nov 2022 02:07:04 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec Message-ID: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> Hi, At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. I think it is better to adjust the operands order of these two instructions to be consistent with the spec. Please take a look and have some reviews. Thanks a lot. [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding ## Testing: - hotspot and jdk tier1 on unmatched board without new failures - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu ------------- Commit messages: - Make the operands order of vrsub_vx/vrsub_vi consistent with rvv spec Changes: https://git.openjdk.org/jdk/pull/11009/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11009&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296447 Stats: 11 lines in 3 files changed: 1 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/11009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11009/head:pull/11009 PR: https://git.openjdk.org/jdk/pull/11009 From yadongwang at openjdk.org Mon Nov 7 03:48:29 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Mon, 7 Nov 2022 03:48:29 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> Message-ID: <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> On Fri, 4 Nov 2022 10:07:04 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! review src/hotspot/cpu/riscv/riscv.ad line 5202: > 5200: __ addi(t0, as_Register($mem$$base), $mem$$disp); > 5201: __ prefetch_w(t0, 0); > 5202: } It'd better to generate the case that handles the imm not fit for instructions like prefetch.w or addi. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From xlinzheng at openjdk.org Mon Nov 7 04:10:34 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Mon, 7 Nov 2022 04:10:34 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register which is killed by MacroAssembler::en/decode_klass_not_null Message-ID: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> Please see the JBS issue for more crash details. To reproduce using a cross-compiled build: # dump one cds-nocoops.jsa <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xlog:cds* -version # reproduce <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation -Xlog:cds* -Xlog:gc+metaspace=info -Xshare:on -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when t1 is used as ic holder klass and t0 is occupied there. These points are special for nearly all registers except argument registers and regs used for special purposes (thread register, etc.) can be used. I propose to use a call-clobbered t2 register here, to keep aligning the i2c2i_adapter logic[1]. Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. Thanks, Xiaolin [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 ------------- Commit messages: - Simply rename tmp -> tmp1 - Remove useless - Refactor - Fix Changes: https://git.openjdk.org/jdk/pull/11010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296448 Stats: 46 lines in 8 files changed: 8 ins; 6 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/11010.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11010/head:pull/11010 PR: https://git.openjdk.org/jdk/pull/11010 From dzhang at openjdk.org Mon Nov 7 06:05:16 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 7 Nov 2022 06:05:16 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec [v2] In-Reply-To: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> Message-ID: <I7ynViT1NGZhHBBGb37Twqt9grDcUDGB9eA9aOdMe38=.bfbd99ec-2134-4b0c-9f96-a50beda79dde@github.com> > Hi, > > At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. > > `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. > > `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. > > I think it is better to adjust the operands order of these two instructions to be consistent with the spec. > > Please take a look and have some reviews. Thanks a lot. > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Remove duplicate macro definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11009/files - new: https://git.openjdk.org/jdk/pull/11009/files/de67289b..0930a669 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11009&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11009&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11009/head:pull/11009 PR: https://git.openjdk.org/jdk/pull/11009 From luhenry at openjdk.org Mon Nov 7 07:30:45 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 7 Nov 2022 07:30:45 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> Message-ID: <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> On Mon, 7 Nov 2022 03:44:20 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! review > > src/hotspot/cpu/riscv/riscv.ad line 5202: > >> 5200: __ addi(t0, as_Register($mem$$base), $mem$$disp); >> 5201: __ prefetch_w(t0, 0); >> 5202: } > > It'd better to generate the case that handles the imm not fit for instructions like prefetch.w or addi. I'm not sure I understand what you're suggesting. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From jnimeh at openjdk.org Mon Nov 7 07:34:06 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:06 GMT Subject: RFR: 8247645: ChaCha20 intrinsics Message-ID: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: - x86_64: AVX, AVX2 and AVX512 - aarch64: platforms that support the advanced SIMD instructions Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) x86_64 Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz Java only (-XX:-UseChaCha20Intrinsics) -------------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s Intrinsics enabled (-XX:UseAVX=1) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s Intrinsics enabled (-XX:UseAVX=2) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s Intrinsics enabled (-XX:UseAVX=3) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s aarch64 Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, part : 0xd0c, revision : 1 Java only (-XX:-UseChaCha20Intrinsics) -------------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s Intrinsics enabled ------------------ Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s Special thanks to the folks who have made many helpful comments while this PR was in draft form. ------------- Commit messages: - consolidate single-structure ld_st methods - Add intrinsic tests that target specific SIMD instruction sets - add explicit int cast on counter rollover protection - Merge with main - expand input sizes for ChaCha20 and ChaCha20-Poly1305 micro benchmarks - rename chapoly to chacha - make alg-specific stub/macro files exclusive to chacha20 - Remove stubRoutines constant generation method, replace using emit_int64/adr - Use block-parallel intrinsic, remove qr-parallel intrinsic, use sub/cbnz for loop control - Minor fixes from comments - ... and 30 more: https://git.openjdk.org/jdk/compare/c7b95a89...c79abe34 Changes: https://git.openjdk.org/jdk/pull/7702/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8247645 Stats: 1590 lines in 30 files changed: 1552 ins; 4 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From aph at openjdk.org Mon Nov 7 07:34:09 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 7 Nov 2022 07:34:09 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. What's the status of this? Please my friend, let's get this finished or I'm going to have to do it myself. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2521: > 2519: #undef INSN3 > 2520: #undef INSN4 > 2521: This code to handle the AdvSIMD load/store single structure and AdvSIMD load/store single structure (post-indexed) is excessive. Every one of these instructions has the the format, `0|Q|0011010|L|R|00000|opcode|S|size|Rn|Rt` or `0|Q|0011011|L|R| Rm|opcode|S|size|Rn|Rt` Perhaps consider using a `RegSet regs` for the registers. Then the instruction encoding to use (1,2,3,or 4 consecutive registers) can be picked up from `regs.size()`. There only needs to be a single routine for all of the `ld_st` variants. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4068: > 4066: __ ext(c, __ T16B, c, c, cCnt); \ > 4067: __ ext(d, __ T16B, d, d, dCnt); \ > 4068: There's a fairly extensive use of macros here for the rounds, but I don't think there's any need for them to be macros. `SHIFT_LANES` and all the other macros here should be functions. This would reduce the size of the libjvm.so binary. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4141: > 4139: // rotation tbl instruction. > 4140: __ lea(tmpAddr, ExternalAddress( > 4141: StubRoutines::aarch64::chacha20_constdata())); Better to move `cc20_gen_constdata()` to the start of `cc20_gen_constdata()`, mark it with a `Label`, and use `adr(tmpAddr, LABEL);` . ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:11 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:11 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <PyjrUPWy5yvYJIb7Vu3s8_H2XEVWlxZ_ffHXFhDY2bg=.ef20c323-880b-4b72-aff9-4445dd2dfe1d@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Work is ongoing. I'm making a few refinements on the x86_64 side and will remove x86_32 stub generators but hopefully will open this up for formal review soon. I've also extended the single-structure st4 to now do single structure st1/2/3/4. I just needed to do a little internal playtesting with them to make sure I was still getting the correct results. I don't plan on using st1/2/3 but since they all use the same opcode generation macros as st4 I figured it would be worth including them. That will all show up in my next commit/push. FYI, I'm holding off on some changes that @iwanowww had suggested in order to wait for #10111 and #10124 to integrate (but more for the former). I think I may end up shifting the CC20 intrinsics into separate files like Vladimir is proposing for AES. Also it has been a while since I've merged the master branch so it could do with a refresh to get 10111 in there. Quick update: I've run into a strange "Unschedulable graph" issue being raised at the C2 layer of things. It happens specifically with the ChaCha20Poly1305.decrypt microbenchmark and only on AVX512 (with -XX:UseAVX=3). Investigation is ongoing, but points away (right now) from the stub itself and may be a latent C2 issue that is being uncovered. I have run hundreds of thousands of AVX512 cc20-p1305 decrypts of various sizes outside the microbenchmark and never run into this. I will share more as I learn it. Quick update on the unschedulable graph issue: It appears that we're running into an issue related to either [JDK-8252848](https://bugs.openjdk.org/browse/JDK-8252848) or [JDK-8266951](https://bugs.openjdk.org/browse/JDK-8266951). A new issue to track this has been created in [JDK-8296233](https://bugs.openjdk.org/browse/JDK-8296233). While this has only ever been seen thus far with the ChaCha20Poly1305.decrypt microbenchmark when -XX:UseAVX=3 is employed, the nature of the issue is such that it could happen with any of the intrinsics since it is triggered more by the library call change to C2's IR. But this has never been seen outside of the current narrow configuration to date. Good news. It turns out that [JDK-8292780](https://bugs.openjdk.org/browse/JDK-8292780) is a fix for the underlying issue that caused the benchmark to crash. Once I did a pull/merge and retested the benchmarks are no longer failing. src/hotspot/cpu/x86/stubGenerator_x86_32.cpp line 3636: > 3634: const XMMRegister zmm_cState = xmm6; > 3635: const XMMRegister zmm_dState = xmm7; > 3636: const XMMRegister zmm_addMask = xmm8; Whoops! It looks like there may not be an xmm8 register available to 32-bit architectures. This may need a little creative restructuring in order to make it work. Or we might just add from the ExternalAddress directly in this specific case. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:12 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:12 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> Message-ID: <2t37RDNwT7XZqqddyBSaUlzj21ERuVhy8XH3EN8RCb4=.e878bb9e-9b90-499e-8289-ac3abe6b62db@github.com> On Fri, 2 Sep 2022 09:32:56 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) >> >> x86_64 >> Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s >> ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s >> ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s >> ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s >> ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s >> ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s >> ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s >> ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s >> >> Intrinsics enabled (-XX:UseAVX=1) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s >> ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s >> ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s >> ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s >> ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s >> ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s >> ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s >> ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s >> >> Intrinsics enabled (-XX:UseAVX=2) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s >> ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s >> ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s >> ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s >> ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s >> ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s >> ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s >> ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s >> >> Intrinsics enabled (-XX:UseAVX=3) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s >> ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s >> ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s >> ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s >> ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s >> ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s >> ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s >> ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s >> >> aarch64 >> Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, >> part : 0xd0c, revision : 1 >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s >> ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s >> ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s >> ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s >> ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s >> ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s >> ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s >> ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s >> >> Intrinsics enabled >> ------------------ >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s >> ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s >> ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s >> ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s >> ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s >> ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s >> ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s >> ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2521: > >> 2519: #undef INSN3 >> 2520: #undef INSN4 >> 2521: > > This code to handle the AdvSIMD load/store single structure and AdvSIMD load/store single structure (post-indexed) is excessive. > > Every one of these instructions has the the format, > > `0|Q|0011010|L|R|00000|opcode|S|size|Rn|Rt` > > or > > `0|Q|0011011|L|R| Rm|opcode|S|size|Rn|Rt` > > Perhaps consider using a `RegSet regs` for the registers. Then the instruction encoding to use (1,2,3,or 4 consecutive registers) can be picked up from `regs.size()`. There only needs to be a single routine for all of the `ld_st` variants. Thanks for the suggestion. I will look into this. I can see how `regs.size()` could simplify these macros. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4068: > >> 4066: __ ext(c, __ T16B, c, c, cCnt); \ >> 4067: __ ext(d, __ T16B, d, d, dCnt); \ >> 4068: > > There's a fairly extensive use of macros here for the rounds, but I don't think there's any need for them to be macros. `SHIFT_LANES` and all the other macros here should be functions. This would reduce the size of the libjvm.so binary. Thanks for the feedback. I've been wondering if I might need something like a macroAssembler_<arch>_chapoly.cpp file to handle these kinds of things and future functions for Poly1305 when I start in on that. I wasn't aware of the impact on libjvm.so going the macro approach versus functions. I'll pull these out to functions. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4141: > >> 4139: // rotation tbl instruction. >> 4140: __ lea(tmpAddr, ExternalAddress( >> 4141: StubRoutines::aarch64::chacha20_constdata())); > > Better to move `cc20_gen_constdata()` to the start of `cc20_gen_constdata()`, mark it with a `Label`, and use `adr(tmpAddr, LABEL);` . I think I see what you're saying from looking at `generate_sha1_implCompress()` and how it uses adr. I also see what looks like a similar approach in some functions in the same file where it defines the constant value via a `static const uint64_t[] foo = { ... };` and then loads that address via `lea(reg, ExternalAddress((address) foo)` and proceeds from there (see `generate_sha3_implCompress()`). To my eye that looks a bit more straightforward and the approach seems to be used more often than the adr approach in the file for defining constants. What I don't know is if one approach is better than the other for other reasons like performance or memory consumption. Do you have any feelings one way or the other? ------------- PR: https://git.openjdk.org/jdk/pull/7702 From aph at openjdk.org Mon Nov 7 07:34:13 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 7 Nov 2022 07:34:13 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <2t37RDNwT7XZqqddyBSaUlzj21ERuVhy8XH3EN8RCb4=.e878bb9e-9b90-499e-8289-ac3abe6b62db@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> <2t37RDNwT7XZqqddyBSaUlzj21ERuVhy8XH3EN8RCb4=.e878bb9e-9b90-499e-8289-ac3abe6b62db@github.com> Message-ID: <9Y4BmZrQIbE94pTuo7vqgMi-v_EFVWpMOCwXVRTW2b4=.c2d5cc64-e4d3-4e7e-ab79-c4960e7209e9@github.com> On Fri, 2 Sep 2022 16:52:02 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2521: >> >>> 2519: #undef INSN3 >>> 2520: #undef INSN4 >>> 2521: >> >> This code to handle the AdvSIMD load/store single structure and AdvSIMD load/store single structure (post-indexed) is excessive. >> >> Every one of these instructions has the the format, >> >> `0|Q|0011010|L|R|00000|opcode|S|size|Rn|Rt` >> >> or >> >> `0|Q|0011011|L|R| Rm|opcode|S|size|Rn|Rt` >> >> Perhaps consider using a `RegSet regs` for the registers. Then the instruction encoding to use (1,2,3,or 4 consecutive registers) can be picked up from `regs.size()`. There only needs to be a single routine for all of the `ld_st` variants. > > Thanks for the suggestion. I will look into this. I can see how `regs.size()` could simplify these macros. Another thing that may be better than a `RegSet`. If you use a C++11 template parameter pack, you can do something like this: template<typename R, typename... Rx> void foo(R first_register, Rx... more_registers) { const R regs[] = { first_register, more_registers... }; // An array that contains the more regs const int count = sizeof...(more_registers); // The count of more regs ... } And then you can use the same logic, regardless of the number of registers. > What I don't know is if one approach is better than the other for other reasons like performance or memory consumption. Do you have any feelings one way or the other? `ADR` is smaller and faster at runtime, `lea(reg, ExternalAddress((address) foo)` with `const uint64_t[] foo = { ... }` will be slightly faster at start-up time. It makes no sense to emit the table with `emit_data64()` then take the address of the table you've just emitted with `lea`. That's worse for startup time _and_ for runtime. So I don't much mind emitting the table at runtime, but if you do, get its address with `ADR`. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From dchuyko at openjdk.org Mon Nov 7 07:34:15 2022 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 7 Nov 2022 07:34:15 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <07z1ueSVi_yO56oR5nT04JKrU_Vui1k7JsgxZrKEnbs=.16a0350b-1f76-4015-9419-035d26a78d82@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4156: > 4154: // Decrement and iterate > 4155: __ subs(loopCtr, loopCtr, 1); > 4156: __ cmp(loopCtr, (u1)0); CMP probably can be removed or can there be just SUB and CBNZ? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4306: > 4304: __ subs(loopCtr, loopCtr, 1); > 4305: __ cmp(loopCtr, (u1)0); > 4306: __ br(Assembler::NE, L_twoRounds); Same thing about subs-cmp0-bne. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:15 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:15 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <07z1ueSVi_yO56oR5nT04JKrU_Vui1k7JsgxZrKEnbs=.16a0350b-1f76-4015-9419-035d26a78d82@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <07z1ueSVi_yO56oR5nT04JKrU_Vui1k7JsgxZrKEnbs=.16a0350b-1f76-4015-9419-035d26a78d82@github.com> Message-ID: <BCdk4pS1pfOgOe-i6plw1_EvJ8wLDUZKCCkilcg_xxA=.9cb99ab4-8beb-454d-9555-2b5ae5760673@github.com> On Thu, 18 Aug 2022 14:26:47 GMT, Dmitry Chuyko <dchuyko at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) >> >> x86_64 >> Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s >> ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s >> ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s >> ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s >> ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s >> ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s >> ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s >> ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s >> >> Intrinsics enabled (-XX:UseAVX=1) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s >> ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s >> ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s >> ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s >> ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s >> ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s >> ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s >> ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s >> >> Intrinsics enabled (-XX:UseAVX=2) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s >> ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s >> ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s >> ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s >> ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s >> ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s >> ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s >> ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s >> >> Intrinsics enabled (-XX:UseAVX=3) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s >> ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s >> ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s >> ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s >> ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s >> ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s >> ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s >> ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s >> >> aarch64 >> Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, >> part : 0xd0c, revision : 1 >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s >> ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s >> ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s >> ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s >> ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s >> ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s >> ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s >> ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s >> >> Intrinsics enabled >> ------------------ >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s >> ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s >> ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s >> ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s >> ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s >> ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s >> ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s >> ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4156: > >> 4154: // Decrement and iterate >> 4155: __ subs(loopCtr, loopCtr, 1); >> 4156: __ cmp(loopCtr, (u1)0); > > CMP probably can be removed or can there be just SUB and CBNZ? See my comment on the similar note below. I will likely be removing this version of the intrinsic in favor of the _blockpar version. I really like that second version better as it removes the need for the two sets of lane shifting operations on each of the 10 iterations. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4306: > >> 4304: __ subs(loopCtr, loopCtr, 1); >> 4305: __ cmp(loopCtr, (u1)0); >> 4306: __ br(Assembler::NE, L_twoRounds); > > Same thing about subs-cmp0-bne. Thanks for the suggestion. I actually have a version of the _blockpar cc20 block function intrinsic that uses a C++ for-loop around the cc20_quarter_round macro calls to generate that portion of the stub. I believe that effectively unrolls the loop in the resulting stub and removes the need for the subs, cmp and br for all 10 iterations. Right now the aarch64 has two versions of the same block function as I was play testing both. I will probably end up removing the _qr (quarter-round parallel) version and favor the _blockpar (block-parallel) version as they both are pretty comparable in terms of speed, but the block parallel version seems to be a little better. I'm always open to these other ways of handling the loop control as assembly is not my strong suit so I appreciate the suggestion! Interesting, I had not considered that. Thanks for pointing that out. I'm honestly not sure how to evaluate the impact of the generated code on the icache. I'll look at the logic surrounding the ghash processBlocks(_wide) code to see how that decision is made. I don't have an aversion to going back to an assembly-based loop using the suggestions that @dchuyko made and maybe that's the right choice if it means more compact code. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From aph at openjdk.org Mon Nov 7 07:34:16 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 7 Nov 2022 07:34:16 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <BCdk4pS1pfOgOe-i6plw1_EvJ8wLDUZKCCkilcg_xxA=.9cb99ab4-8beb-454d-9555-2b5ae5760673@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <07z1ueSVi_yO56oR5nT04JKrU_Vui1k7JsgxZrKEnbs=.16a0350b-1f76-4015-9419-035d26a78d82@github.com> <BCdk4pS1pfOgOe-i6plw1_EvJ8wLDUZKCCkilcg_xxA=.9cb99ab4-8beb-454d-9555-2b5ae5760673@github.com> Message-ID: <RS_tuxkIudqRFFwLEtUlzYnpSxsueee_Zu5SxAyv4nA=.f26a6f98-3533-4467-937c-261d1106fdb4@github.com> On Thu, 18 Aug 2022 14:43:51 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4306: >> >>> 4304: __ subs(loopCtr, loopCtr, 1); >>> 4305: __ cmp(loopCtr, (u1)0); >>> 4306: __ br(Assembler::NE, L_twoRounds); >> >> Same thing about subs-cmp0-bne. > > Thanks for the suggestion. I actually have a version of the _blockpar cc20 block function intrinsic that uses a C++ for-loop around the cc20_quarter_round macro calls to generate that portion of the stub. I believe that effectively unrolls the loop in the resulting stub and removes the need for the subs, cmp and br for all 10 iterations. Right now the aarch64 has two versions of the same block function as I was play testing both. I will probably end up removing the _qr (quarter-round parallel) version and favor the _blockpar (block-parallel) version as they both are pretty comparable in terms of speed, but the block parallel version seems to be a little better. > > I'm always open to these other ways of handling the loop control as assembly is not my strong suit so I appreciate the suggestion! Be careful about the code expansion. If you're not careful you'll blow away much of the icache for little benefit. For AES/GCM on AArch64 we have generate `_ghash_processBlocks()` and `generate_ghash_processBlocks_wide()`. We don't call the big one unless it's worth it. It all depends on how big the code turns out to be. > Interesting, I had not considered that. Thanks for pointing that out. I'm honestly not sure how to evaluate the impact of the generated code on the icache. I'll look at the logic surrounding the ghash processBlocks(_wide) code to see how that decision is made. I don't have an aversion to going back to an assembly-based loop using the suggestions that @dchuyko made and maybe that's the right choice if it means more compact code. It's not so complicated. if you can make the code smaller with negligible impact on throughput, do so. If not, don't. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:17 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:17 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <RS_tuxkIudqRFFwLEtUlzYnpSxsueee_Zu5SxAyv4nA=.f26a6f98-3533-4467-937c-261d1106fdb4@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <07z1ueSVi_yO56oR5nT04JKrU_Vui1k7JsgxZrKEnbs=.16a0350b-1f76-4015-9419-035d26a78d82@github.com> <BCdk4pS1pfOgOe-i6plw1_EvJ8wLDUZKCCkilcg_xxA=.9cb99ab4-8beb-454d-9555-2b5ae5760673@github.com> <RS_tuxkIudqRFFwLEtUlzYnpSxsueee_Zu5SxAyv4nA=.f26a6f98-3533-4467-937c-261d1106fdb4@github.com> Message-ID: <lsHRgtMSbhVGRklc3tY4BoJ_nNBIgRC-I8e-nT2myeI=.56815ba4-f880-460d-beb9-f9ff6ce302ce@github.com> On Fri, 16 Sep 2022 09:27:39 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Interesting, I had not considered that. Thanks for pointing that out. I'm honestly not sure how to evaluate the impact of the generated code on the icache. I'll look at the logic surrounding the ghash processBlocks(_wide) code to see how that decision is made. I don't have an aversion to going back to an assembly-based loop using the suggestions that @dchuyko made and maybe that's the right choice if it means more compact code. > > It's not so complicated. if you can make the code smaller with negligible impact on throughput, do so. If not, don't. I really didn't see a noticeable impact on performance with the loop unrolled so I'm going with the SUB/CBNZ approach. Seems like it does the best job of keeping the generated stub smaller and still be a tiny bit more efficient than what I started with. As always, I appreciate the suggestions. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From sviswanathan at openjdk.org Mon Nov 7 07:34:25 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 7 Nov 2022 07:34:25 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <e7YhTmbhucfO2m_KITHD4rcwuqAK1zdA-_V8Q2oAdLk=.3e88980c-1a19-426b-8444-db8697d29f8b@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. src/hotspot/cpu/x86/assembler_x86.cpp line 4994: > 4992: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 4993: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : > 4994: (vector_len == AVX_512bit ? VM_Version::supports_evex() : 0)), ""); VM_Version::supports_evex() here should be VM_Version::supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 4996: > 4994: (vector_len == AVX_512bit ? VM_Version::supports_evex() : 0)), ""); > 4995: NOT_LP64(assert(VM_Version::supports_sse2(), "")); > 4996: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); legacy mode here should be _legacy_mode_bw. src/hotspot/cpu/x86/assembler_x86.cpp line 5025: > 5023: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 5024: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : > 5025: (vector_len == AVX_512bit ? VM_Version::supports_evex() : 0)), ""); VM_Version::supports_evex() here should be VM_Version::supports_avx512bw(). src/hotspot/cpu/x86/assembler_x86.cpp line 5027: > 5025: (vector_len == AVX_512bit ? VM_Version::supports_evex() : 0)), ""); > 5026: NOT_LP64(assert(VM_Version::supports_sse2(), "")); > 5027: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); legacy_mode here should be _legacy_mode_bw. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5682: > 5680: /* Add mask for 4-block ChaCha20 Block calculations */ > 5681: address chacha20_ctradd_avx512() { > 5682: __ align(CodeEntryAlignment); This could be __ align64(); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5698: > 5696: /* Scatter mask for key stream output on AVX-512 */ > 5697: address chacha20_scmask_avx512() { > 5698: __ align(CodeEntryAlignment); This could be __ align64(); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5728: > 5726: const XMMRegister zmm_cVec = xmm2; > 5727: const XMMRegister zmm_dVec = xmm3; > 5728: const XMMRegister zmm_scratch = xmm4; We could have 5 additional scratch registers zmm_s1 .. zmm_s5 (mapping to xmm5 ... xmm9) to keep values read from memory into registers. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5738: > 5736: __ evbroadcasti32x4(zmm_bVec, Address(state, 16), Assembler::AVX_512bit); > 5737: __ evbroadcasti32x4(zmm_cVec, Address(state, 32), Assembler::AVX_512bit); > 5738: __ evbroadcasti32x4(zmm_dVec, Address(state, 48), Assembler::AVX_512bit); zmm_aVec to zmm_dVec could be copied into zmm_s1 to zmm_s4 respectively thereby eliminating broadcast needed later. For example: __ evmovdquq(zmm_s1, zmm_aVec, Assembler::AVX_512bit); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5740: > 5738: __ evbroadcasti32x4(zmm_dVec, Address(state, 48), Assembler::AVX_512bit); > 5739: > 5740: __ vpaddd(zmm_dVec, zmm_dVec, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); The chacha20_counter_addmask_avx512() could be preloaded into zmm_s5 before line 5735 as follows: __ evmovdquq(zmm_s5, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); vpaddd can then use zmm_s5 also the later usage could use zmm_s5 directly. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5827: > 5825: __ evbroadcasti32x4(zmm_scratch, Address(state, 48), Assembler::AVX_512bit); > 5826: __ vpaddd(zmm_dVec, zmm_dVec, zmm_scratch, Assembler::AVX_512bit); > 5827: __ vpaddd(zmm_dVec, zmm_dVec, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); These could directly use the values in zmm_s1 to zmm_s5 registers : __ vpaddd(zmm_aVec, zmm_aVec, zmm_s1, Assembler::AVX_512bit); ... __ vpaddd(zmm_dVec, zmm_dVec, zmm_s5, Assembler::AVX_512bit); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5842: > 5840: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 32), writeMask, zmm_cVec, Assembler::AVX_512bit); > 5841: __ knotwl(writeMask, writeMask); > 5842: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 48), writeMask, zmm_dVec, Assembler::AVX_512bit); Using the vextracti32x4 instead of evpscatterdd would give better performance: __ vextracti32x4(Address(result, 0), zmm_aVec, 0); __ vextracti32x4(Address(result, 64), zmm_aVec, 1); __ vextracti32x4(Address(result, 128), zmm_aVec, 2); __ vextracti32x4(Address(result, 192), zmm_aVec, 3); __ vextracti32x4(Address(result, 16), zmm_bVec, 0); __ vextracti32x4(Address(result, 80), zmm_bVec, 1); __ vextracti32x4(Address(result, 144), zmm_bVec, 2); __ vextracti32x4(Address(result, 208), zmm_bVec, 3); __ vextracti32x4(Address(result, 32), zmm_cVec, 0); __ vextracti32x4(Address(result, 96), zmm_cVec, 1); __ vextracti32x4(Address(result, 160), zmm_cVec, 2); __ vextracti32x4(Address(result, 224), zmm_cVec, 3); __ vextracti32x4(Address(result, 48), zmm_dVec, 0); __ vextracti32x4(Address(result, 112), zmm_dVec, 1); __ vextracti32x4(Address(result, 176), zmm_dVec, 2); __ vextracti32x4(Address(result, 240), zmm_dVec, 3); ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:26 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:26 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <e7YhTmbhucfO2m_KITHD4rcwuqAK1zdA-_V8Q2oAdLk=.3e88980c-1a19-426b-8444-db8697d29f8b@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <e7YhTmbhucfO2m_KITHD4rcwuqAK1zdA-_V8Q2oAdLk=.3e88980c-1a19-426b-8444-db8697d29f8b@github.com> Message-ID: <5vHrtG9EzPVw4Rzda3-H85Upr1jANl472BED7D38Vw4=.38e54240-afe5-449b-b357-adfe99ed3839@github.com> On Wed, 16 Mar 2022 00:48:17 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) >> >> x86_64 >> Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s >> ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s >> ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s >> ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s >> ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s >> ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s >> ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s >> ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s >> >> Intrinsics enabled (-XX:UseAVX=1) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s >> ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s >> ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s >> ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s >> ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s >> ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s >> ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s >> ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s >> >> Intrinsics enabled (-XX:UseAVX=2) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s >> ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s >> ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s >> ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s >> ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s >> ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s >> ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s >> ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s >> >> Intrinsics enabled (-XX:UseAVX=3) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s >> ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s >> ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s >> ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s >> ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s >> ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s >> ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s >> ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s >> >> aarch64 >> Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, >> part : 0xd0c, revision : 1 >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s >> ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s >> ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s >> ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s >> ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s >> ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s >> ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s >> ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s >> >> Intrinsics enabled >> ------------------ >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s >> ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s >> ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s >> ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s >> ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s >> ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s >> ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s >> ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > src/hotspot/cpu/x86/assembler_x86.cpp line 5027: > >> 5025: (vector_len == AVX_512bit ? VM_Version::supports_evex() : 0)), ""); >> 5026: NOT_LP64(assert(VM_Version::supports_sse2(), "")); >> 5027: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); > > legacy_mode here should be _legacy_mode_bw. Good catch, fixed, along with all the other similar findings below. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5682: > >> 5680: /* Add mask for 4-block ChaCha20 Block calculations */ >> 5681: address chacha20_ctradd_avx512() { >> 5682: __ align(CodeEntryAlignment); > > This could be __ align64(); Done > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5698: > >> 5696: /* Scatter mask for key stream output on AVX-512 */ >> 5697: address chacha20_scmask_avx512() { >> 5698: __ align(CodeEntryAlignment); > > This could be __ align64(); Done > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5728: > >> 5726: const XMMRegister zmm_cVec = xmm2; >> 5727: const XMMRegister zmm_dVec = xmm3; >> 5728: const XMMRegister zmm_scratch = xmm4; > > We could have 5 additional scratch registers zmm_s1 .. zmm_s5 (mapping to xmm5 ... xmm9) to keep values read from memory into registers. For AVX-512 I was able to get it to work with 4 scratch registers fortunately. For AVX and AVX2 I think the same approach can work, but since there are no lanewise bit rotation instructions (just L/R shifts) that I can find I need a 5th scratch register. For the 32-bit version it is a little more complicated as there are only 8 SIMD registers to work with. I think even there I could simply read the state from memory for one memory-to-register add instead of doing 4, and then hold the other 128-bit state lines on 3 scratch registers. I'm going to experiment with that a bit to see how much I can limit memory fetches to get some improvements on both 64-bit and 32-bit. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5738: > >> 5736: __ evbroadcasti32x4(zmm_bVec, Address(state, 16), Assembler::AVX_512bit); >> 5737: __ evbroadcasti32x4(zmm_cVec, Address(state, 32), Assembler::AVX_512bit); >> 5738: __ evbroadcasti32x4(zmm_dVec, Address(state, 48), Assembler::AVX_512bit); > > zmm_aVec to zmm_dVec could be copied into zmm_s1 to zmm_s4 respectively thereby eliminating broadcast needed later. For example: > __ evmovdquq(zmm_s1, zmm_aVec, Assembler::AVX_512bit); A good suggestion, this has been changed. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5740: > >> 5738: __ evbroadcasti32x4(zmm_dVec, Address(state, 48), Assembler::AVX_512bit); >> 5739: >> 5740: __ vpaddd(zmm_dVec, zmm_dVec, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); > > The chacha20_counter_addmask_avx512() could be preloaded into zmm_s5 before line 5735 as follows: > __ evmovdquq(zmm_s5, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); > vpaddd can then use zmm_s5 also the later usage could use zmm_s5 directly. Another good improvement, done. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5827: > >> 5825: __ evbroadcasti32x4(zmm_scratch, Address(state, 48), Assembler::AVX_512bit); >> 5826: __ vpaddd(zmm_dVec, zmm_dVec, zmm_scratch, Assembler::AVX_512bit); >> 5827: __ vpaddd(zmm_dVec, zmm_dVec, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); > > These could directly use the values in zmm_s1 to zmm_s5 registers : > __ vpaddd(zmm_aVec, zmm_aVec, zmm_s1, Assembler::AVX_512bit); > ... > __ vpaddd(zmm_dVec, zmm_dVec, zmm_s5, Assembler::AVX_512bit); Keeping the original broadcasted state data on registers was a good idea, as it saved me the extra reach out to memory at the end of the loop. Fixed as recommended. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5842: > >> 5840: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 32), writeMask, zmm_cVec, Assembler::AVX_512bit); >> 5841: __ knotwl(writeMask, writeMask); >> 5842: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 48), writeMask, zmm_dVec, Assembler::AVX_512bit); > > Using the vextracti32x4 instead of evpscatterdd would give better performance: > __ vextracti32x4(Address(result, 0), zmm_aVec, 0); > __ vextracti32x4(Address(result, 64), zmm_aVec, 1); > __ vextracti32x4(Address(result, 128), zmm_aVec, 2); > __ vextracti32x4(Address(result, 192), zmm_aVec, 3); > __ vextracti32x4(Address(result, 16), zmm_bVec, 0); > __ vextracti32x4(Address(result, 80), zmm_bVec, 1); > __ vextracti32x4(Address(result, 144), zmm_bVec, 2); > __ vextracti32x4(Address(result, 208), zmm_bVec, 3); > __ vextracti32x4(Address(result, 32), zmm_cVec, 0); > __ vextracti32x4(Address(result, 96), zmm_cVec, 1); > __ vextracti32x4(Address(result, 160), zmm_cVec, 2); > __ vextracti32x4(Address(result, 224), zmm_cVec, 3); > __ vextracti32x4(Address(result, 48), zmm_dVec, 0); > __ vextracti32x4(Address(result, 112), zmm_dVec, 1); > __ vextracti32x4(Address(result, 176), zmm_dVec, 2); > __ vextracti32x4(Address(result, 240), zmm_dVec, 3); I have been wondering about this approach for a while now, since I did something similar for the AVX2 version. I had assumed that using evpscatterdd used less instructions and therefore would be more efficient, but I'm more than happy to move to the vextracti32x4 approach. I'll be eager to see how it impacts performance along with the increased storage of intermediate data on additional XMMRegister objects. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From vlivanov at openjdk.org Mon Nov 7 07:34:29 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Nov 2022 07:34:29 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <bfvITmE9Ikr80Nagm_5JQuvxK7QoU_sXFGYgnMLfZgQ=.87249f91-ca3f-4529-a1a4-cb642480b0da@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. src/hotspot/cpu/x86/assembler_x86.cpp line 5034: > 5032: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 5033: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : > 5034: (vector_len == AVX_512bit ? VM_Version::supports_avx512bw() : 0)), ""); It's better to use `false` rather than `0`. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5364: > 5362: > 5363: /* The 2-block AVX/AVX2-enabled ChaCha20 block function implementation */ > 5364: address generate_chacha20Block_avx() { Considering you already introduce a dedicated CPP file, it makes sense to move the guts of this function into `macroAssembler_x86_chapoly.cpp`. src/hotspot/cpu/x86/vm_version_x86.cpp line 1128: > 1126: // based on the VM capabilities whether to use an AVX2 or AVX512-enabled > 1127: // version. > 1128: if ((UseSSE >= 2) && (UseAVX >= 1) ) { `UseAVX > 1` already implies `UseSSE >=2`. src/hotspot/share/opto/library_call.cpp line 6913: > 6911: Node* cc20Blk = make_runtime_call(RC_LEAF|RC_NO_FP, > 6912: OptoRuntime::chacha20Block_Type(), > 6913: stubAddr, stubName, TypePtr::BOTTOM, BTW it can be further improved: the stub reads from `int[]` and writes into `byte[]` while `TypePtr::BOTTOM` signals both in and out memory state is wide. `GraphKit::make_runtime_call()` doesn't support it yet, but if you pass input and output address types separately, it should be possible to turn both into narrow memory and represent the runtime call accordingly (see `wide_in`/`wide_out`-related code in `GraphKit::make_runtime_call()`). Also, it can be done as a follow-up enhancement later. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:31 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:31 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <bfvITmE9Ikr80Nagm_5JQuvxK7QoU_sXFGYgnMLfZgQ=.87249f91-ca3f-4529-a1a4-cb642480b0da@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <bfvITmE9Ikr80Nagm_5JQuvxK7QoU_sXFGYgnMLfZgQ=.87249f91-ca3f-4529-a1a4-cb642480b0da@github.com> Message-ID: <LEzZ_esHmVlGl4snVbW1fuNuYshGI6opyJeQG4157RM=.2e6abf99-d02f-4462-9eff-f281d0c98ce7@github.com> On Mon, 6 Jun 2022 20:57:49 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) >> >> x86_64 >> Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s >> ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s >> ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s >> ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s >> ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s >> ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s >> ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s >> ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s >> >> Intrinsics enabled (-XX:UseAVX=1) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s >> ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s >> ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s >> ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s >> ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s >> ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s >> ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s >> ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s >> >> Intrinsics enabled (-XX:UseAVX=2) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s >> ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s >> ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s >> ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s >> ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s >> ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s >> ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s >> ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s >> >> Intrinsics enabled (-XX:UseAVX=3) >> --------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s >> ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s >> ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s >> ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s >> ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s >> ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s >> ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s >> ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s >> >> aarch64 >> Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, >> part : 0xd0c, revision : 1 >> >> Java only (-XX:-UseChaCha20Intrinsics) >> -------------------------------------- >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s >> ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s >> ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s >> ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s >> ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s >> ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s >> ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s >> ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s >> >> Intrinsics enabled >> ------------------ >> Benchmark (dataSize) Mode Cnt Score Error Units >> ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s >> ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s >> ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s >> ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s >> ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s >> ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s >> ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s >> ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s >> >> ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s >> ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s >> ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s >> ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s >> ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s >> ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s >> ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s >> ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5364: > >> 5362: >> 5363: /* The 2-block AVX/AVX2-enabled ChaCha20 block function implementation */ >> 5364: address generate_chacha20Block_avx() { > > Considering you already introduce a dedicated CPP file, it makes sense to move the guts of this function into `macroAssembler_x86_chapoly.cpp`. I've updated the code to follow your example with AES and moved the intrinsics into their own stubGenerator_x86_64_chapoly.cpp. I hope that will compartmentalize things. I'm not sure if I should combine the macroAssembler_x86_64_chapoly.cpp with the new stubGenerator file. Certainly willing to do that, but I wanted to get the dedicated stubGenerator file working first since the last merge was ugly. Going forward merges should be much easier now that my code is compartmentalized. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 07:34:32 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 07:34:32 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <5vHrtG9EzPVw4Rzda3-H85Upr1jANl472BED7D38Vw4=.38e54240-afe5-449b-b357-adfe99ed3839@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <e7YhTmbhucfO2m_KITHD4rcwuqAK1zdA-_V8Q2oAdLk=.3e88980c-1a19-426b-8444-db8697d29f8b@github.com> <5vHrtG9EzPVw4Rzda3-H85Upr1jANl472BED7D38Vw4=.38e54240-afe5-449b-b357-adfe99ed3839@github.com> Message-ID: <DDY2X4rX-b_RTtqS5u6wlY2uGWahm8WKTZdLc6s4i7o=.1368a123-0468-4fb1-95e3-718cc47caffc@github.com> On Tue, 22 Mar 2022 23:07:30 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 5842: >> >>> 5840: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 32), writeMask, zmm_cVec, Assembler::AVX_512bit); >>> 5841: __ knotwl(writeMask, writeMask); >>> 5842: __ evpscatterdd(Address(result, zmm_scratch, Address::times_4, 48), writeMask, zmm_dVec, Assembler::AVX_512bit); >> >> Using the vextracti32x4 instead of evpscatterdd would give better performance: >> __ vextracti32x4(Address(result, 0), zmm_aVec, 0); >> __ vextracti32x4(Address(result, 64), zmm_aVec, 1); >> __ vextracti32x4(Address(result, 128), zmm_aVec, 2); >> __ vextracti32x4(Address(result, 192), zmm_aVec, 3); >> __ vextracti32x4(Address(result, 16), zmm_bVec, 0); >> __ vextracti32x4(Address(result, 80), zmm_bVec, 1); >> __ vextracti32x4(Address(result, 144), zmm_bVec, 2); >> __ vextracti32x4(Address(result, 208), zmm_bVec, 3); >> __ vextracti32x4(Address(result, 32), zmm_cVec, 0); >> __ vextracti32x4(Address(result, 96), zmm_cVec, 1); >> __ vextracti32x4(Address(result, 160), zmm_cVec, 2); >> __ vextracti32x4(Address(result, 224), zmm_cVec, 3); >> __ vextracti32x4(Address(result, 48), zmm_dVec, 0); >> __ vextracti32x4(Address(result, 112), zmm_dVec, 1); >> __ vextracti32x4(Address(result, 176), zmm_dVec, 2); >> __ vextracti32x4(Address(result, 240), zmm_dVec, 3); > > I have been wondering about this approach for a while now, since I did something similar for the AVX2 version. I had assumed that using evpscatterdd used less instructions and therefore would be more efficient, but I'm more than happy to move to the vextracti32x4 approach. I'll be eager to see how it impacts performance along with the increased storage of intermediate data on additional XMMRegister objects. The changes you recommended yielded about a 10-15% performance improvement on the system I was using for benchmarks. Thanks for the suggestions! ------------- PR: https://git.openjdk.org/jdk/pull/7702 From luhenry at openjdk.org Mon Nov 7 07:39:26 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 7 Nov 2022 07:39:26 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec [v2] In-Reply-To: <I7ynViT1NGZhHBBGb37Twqt9grDcUDGB9eA9aOdMe38=.bfbd99ec-2134-4b0c-9f96-a50beda79dde@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> <I7ynViT1NGZhHBBGb37Twqt9grDcUDGB9eA9aOdMe38=.bfbd99ec-2134-4b0c-9f96-a50beda79dde@github.com> Message-ID: <GPMDsrypVIG1tygjT_C_RLif14n9K77AT_wBN5mYAso=.8f8dca7c-6599-4638-8f6d-13761efb585b@github.com> On Mon, 7 Nov 2022 06:05:16 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: >> Hi, >> >> At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. >> >> `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. >> >> `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. >> >> I think it is better to adjust the operands order of these two instructions to be consistent with the spec. >> >> Please take a look and have some reviews. Thanks a lot. >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding >> >> ## Testing: >> >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu >> - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate macro definition I also verified it matches with https://github.com/riscv/riscv-opcodes/blob/master/rv_v ------------- Marked as reviewed by luhenry (Author). PR: https://git.openjdk.org/jdk/pull/11009 From djelinski at openjdk.org Mon Nov 7 08:08:25 2022 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 7 Nov 2022 08:08:25 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <QhaLrjytvPfClkXYMhnAArELqVrbivgiKKgCqGeu0Ug=.54ca5059-ed0f-4f12-a41b-81836497f683@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Is it expected that AVX3 is 35% slower than AVX2 and 8% slower than AVX1? ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Mon Nov 7 08:54:08 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Nov 2022 08:54:08 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <QhaLrjytvPfClkXYMhnAArELqVrbivgiKKgCqGeu0Ug=.54ca5059-ed0f-4f12-a41b-81836497f683@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <QhaLrjytvPfClkXYMhnAArELqVrbivgiKKgCqGeu0Ug=.54ca5059-ed0f-4f12-a41b-81836497f683@github.com> Message-ID: <ph6Wa6IO7lLcbTyi1yI3ZYsZejpWY7cNy1pzkMBh0Yo=.b0674112-5547-468a-ab53-262be342ee24@github.com> On Mon, 7 Nov 2022 08:04:15 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote: > Is it expected that AVX3 is 35% slower than AVX2 and 8% slower than AVX1? Well, it isn't slower than AVX/AVX2 across the board. For plain ChaCha20 it is slower for this particular benchmark at 256 bytes (and smaller I would assume), but that changes at data sizes above 256 bytes. I haven't worked out the timings exactly, but this is what I think is happening: The AVX512 intrinsic broadcasts into the registers from memory using twice as many registers and at twice the size over AVX2, and likewise writes 4x the amount of data into the keystream buffer upon completion. I'm not certain by how much, but I believe the runtime of the AVX512 intrinsic is longer than that of the AVX/AVX2 intrinsic. When the job size is 256 bytes, both AVX2 and AVX512 will run their intrinsics one time and that may account for the speed difference. And for AVX at 256 bytes the intrinsic has to run twice (which is why the slowdown is less). When you get to 1024 bytes, AVX has to run 8 times to make enough key stream, AVX2 has to run 4 times, but AVX512 still only has to run once. So now AVX512 outperforms the other two, and continues that way for any larger single-part encryption job this benchmark is doing. I haven't tried running other sizes yet to see where that cross-over point is, but I suspect it is probably once a job gets above 512 bytes. This particular benchmark is a single-part encryption or decryption. The performance characteristics look different when you are taking a large buffer and submitting multi-part updates. In that case 16 byte updates has AVX512 and AVX2 nearly identical (AVX2 is 2% faster, AVX is already 23% slower), at 64 bytes AVX512 is faster than everything and the gap widens as the update sizes grow. To be fair, I think the single-part jobs are more representative of what we would see in JSSE, but TLS application data job sizes are probably all over the map depending on what is being sent. ChaCha20-Poly1305 - There AVX512 is slower than AVX2 across the board, and I am not sure why even at larger sizes the throughput gains we see in ChaCha20 are not seen here. There's a lot more work being done outside the cc20 intrinsic, especially without the pending AVX512 poly1305 intrinsic, but I would've expected to see a crossover point at one of those benchmark sizes. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From lkorinth at openjdk.org Mon Nov 7 09:01:44 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 7 Nov 2022 09:01:44 GMT Subject: RFR: 8296401: ConcurrentHashTable::bulk_delete might miss to delete some objects In-Reply-To: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> References: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> Message-ID: <fgI1HOxvAY9yBOnav9Lemxkn7go3DTWYVIjSsOkdJu4=.a3b86100-581c-43f1-a184-8e6ec0d1158c@github.com> On Fri, 4 Nov 2022 13:38:23 GMT, Leo Korinth <lkorinth at openjdk.org> wrote: > ConcurrentHashTable::bulk_delete might miss to delete some objects if a bucket has more than 256 entries. Current uses of ConcurrentHashTable are not harmed by this behaviour. > > I modified gtest:ConcurrentHashTable to detect the problem (first commit), and fixed the problem in the code (second commit). > > Tests passes tier1-3. The function `delete_check_nodes` breaks after 256 entries: if (dels == num_del) { break; My added `for (;;)` will remove entries until the bucket is empty. ------------- PR: https://git.openjdk.org/jdk/pull/10983 From mcimadamore at openjdk.org Mon Nov 7 09:24:04 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 09:24:04 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> Message-ID: <uwIHMZkXTUhF7LQ28QNk7LI2Vls1fyZ6t24DDkP1vpI=.526479a8-c268-41f0-ab47-74b5e06f8311@github.com> On Sat, 5 Nov 2022 18:02:33 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into PR_20 >> - Merge branch 'master' into PR_20 >> - Merge pull request #14 from minborg/small-javadoc >> >> Update some javadocs >> - Update some javadocs >> - Revert some javadoc changes >> - Merge branch 'master' into PR_20 >> - Fix benchmark and test failure >> - Merge pull request #13 from minborg/revert-factories >> >> Revert MemorySegment factories >> - Update javadocs after comments >> - Revert MemorySegment factories >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d314527d...3d933028 > > src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java line 477: > >> 475: case UNBOX_ADDRESS -> emitUnboxAddress(); >> 476: case DUP -> emitDupBinding(); >> 477: case CAST -> emitCast((Binding.Cast) binding); > > This contains the CAST binding, but not the accompanying VM changes from: https://github.com/openjdk/panama-foreign/pull/720 which removes the now dead code. Preferably both changes go together (and the code removal is pretty trivial, so I suggest including it here) Why did the normalization test passed even w/o VM changes? Is that because the VM code changes are just removing what is now dead code, right? ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Nov 7 09:30:19 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 09:30:19 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> Message-ID: <gjk9ZLzpYbu9cVSzfcq6DQ2NDzZHzlTznd-24UBFLgM=.710eb153-1731-415b-87b2-4d2fedbb32bb@github.com> On Sat, 5 Nov 2022 18:04:38 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into PR_20 >> - Merge branch 'master' into PR_20 >> - Merge pull request #14 from minborg/small-javadoc >> >> Update some javadocs >> - Update some javadocs >> - Revert some javadoc changes >> - Merge branch 'master' into PR_20 >> - Fix benchmark and test failure >> - Merge pull request #13 from minborg/revert-factories >> >> Revert MemorySegment factories >> - Update javadocs after comments >> - Revert MemorySegment factories >> - ... and 7 more: https://git.openjdk.org/jdk/compare/1fd35b8a...3d933028 > > src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java line 491: > >> 489: emitLoad(highLevelType, paramIndex2ParamSlot[paramIndex]); >> 490: >> 491: if (shouldAcquire(paramIndex)) { > > I can't comment on the actual line below, but this is also missing the fix from: https://github.com/openjdk/panama-foreign/pull/739 (that is a Java-only change as well). I suggest adding that as well. Few changes were missing - but the `dontrelease` tests was passing... odd ------------- PR: https://git.openjdk.org/jdk/pull/10872 From lkorinth at openjdk.org Mon Nov 7 09:31:13 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 7 Nov 2022 09:31:13 GMT Subject: RFR: 8296401: ConcurrentHashTable::bulk_delete might miss to delete some objects In-Reply-To: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> References: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> Message-ID: <hZU4wQVp14chITsuPCoUcJdipUIjzR7QDAI8phEkwcI=.d99495e4-34f5-4d3a-b90a-4f9dc5a9a215@github.com> On Fri, 4 Nov 2022 13:38:23 GMT, Leo Korinth <lkorinth at openjdk.org> wrote: > ConcurrentHashTable::bulk_delete might miss to delete some objects if a bucket has more than 256 entries. Current uses of ConcurrentHashTable are not harmed by this behaviour. > > I modified gtest:ConcurrentHashTable to detect the problem (first commit), and fixed the problem in the code (second commit). > > Tests passes tier1-3. They will be deleted BULK_DELETE_LIMIT (256) at a time until the bucket is empty, and the control flow will then exit with the help of a `break` (instead of a `continue`). ------------- PR: https://git.openjdk.org/jdk/pull/10983 From mcimadamore at openjdk.org Mon Nov 7 09:42:25 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 09:42:25 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> Message-ID: <09WcJa8UMXNVYkW_eI7w-tW7fnZ6KXt1gt_4cu5Y2KE=.8be58947-1169-42e0-8c6f-45b66ff63c10@github.com> On Sat, 5 Nov 2022 18:40:56 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into PR_20 >> - Merge branch 'master' into PR_20 >> - Merge pull request #14 from minborg/small-javadoc >> >> Update some javadocs >> - Update some javadocs >> - Revert some javadoc changes >> - Merge branch 'master' into PR_20 >> - Fix benchmark and test failure >> - Merge pull request #13 from minborg/revert-factories >> >> Revert MemorySegment factories >> - Update javadocs after comments >> - Revert MemorySegment factories >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d8bb7119...3d933028 > > src/java.base/share/classes/jdk/internal/foreign/abi/x64/windows/CallArranger.java line 165: > >> 163: assert forArguments : "no stack returns"; >> 164: // stack >> 165: long alignment = Math.max(layout.byteAlignment(), STACK_SLOT_SIZE); > > This is also missing part of the changes from: https://github.com/openjdk/panama-foreign/pull/728/ but other changes to the shared code are present. The `layout` parameter is not needed here. (see the changes to this file in the original PR) Actually, this patch is missing most of the stuff in PR 728. I was under the impression that, in order to fully support that, some VM changes were needed (e.g. to have better granularity in call shuffling - as per https://github.com/openjdk/panama-foreign/pull/699). As a result, this PR only contains changes to SharedUtil (to remove unused alignment functions) - but nothing else. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Nov 7 09:47:34 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 09:47:34 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v3] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <JcunES0_tenzQnCZPFDzM3kKAtKLaOXmOz10mujtzmk=.850e7685-c88c-44cc-8053-5f472aa0f7d8@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: - Fix mismmatched acquire/release in BindingSpecializer - Fix MemorySegment.ofBuffer when applied to StringCharBuffer - Remove VM dead code after implementation of Binding.Cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/3d933028..e8b95f83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=01-02 Stats: 51 lines in 6 files changed: 12 ins; 34 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Nov 7 12:29:37 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 12:29:37 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v4] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <b8jy-BY9ykLzHYS5hmf8_rNZvY6wLflhCUeWQF3_1qs=.2e9bb0a3-c25d-495c-98a5-14b06bd15550@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Add missing tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/e8b95f83..0c70da2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=02-03 Stats: 162 lines in 3 files changed: 162 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From aboldtch at openjdk.org Mon Nov 7 13:31:09 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Nov 2022 13:31:09 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing Message-ID: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. Enables the following ```C++ REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) os::print_register_info_header(st, _context); REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) // decode register contents if possible ResourceMark rm(_thread); os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); REENTRANT_LOOP_END st->cr(); Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) ------------- Commit messages: - Add os::print_register_info_header - VMError::report reentrant for register and stack print Changes: https://git.openjdk.org/jdk/pull/11017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296469 Stats: 747 lines in 19 files changed: 477 ins; 123 del; 147 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Mon Nov 7 13:33:36 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Nov 2022 13:33:36 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability Message-ID: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Refactor the STEP macro in VMError::report to improve readability. Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. This enhancement aims to do two things: 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro Testing: tier 1 + GHA ------------- Commit messages: - Fix STEP_IF condition crash and clear_step_start_time - VMError::report STEP_IF and cleanup Changes: https://git.openjdk.org/jdk/pull/11018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11018&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296470 Stats: 729 lines in 2 files changed: 169 ins; 297 del; 263 mod Patch: https://git.openjdk.org/jdk/pull/11018.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11018/head:pull/11018 PR: https://git.openjdk.org/jdk/pull/11018 From jvernee at openjdk.org Mon Nov 7 13:45:33 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 7 Nov 2022 13:45:33 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <09WcJa8UMXNVYkW_eI7w-tW7fnZ6KXt1gt_4cu5Y2KE=.8be58947-1169-42e0-8c6f-45b66ff63c10@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> <09WcJa8UMXNVYkW_eI7w-tW7fnZ6KXt1gt_4cu5Y2KE=.8be58947-1169-42e0-8c6f-45b66ff63c10@github.com> Message-ID: <b1ckc2kdUNotWjqf-qSLlZsKK-LrBe90YD5JEIw8M58=.61cda3cf-a617-4cc1-8875-97dda762dd4b@github.com> On Mon, 7 Nov 2022 09:40:03 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/x64/windows/CallArranger.java line 165: >> >>> 163: assert forArguments : "no stack returns"; >>> 164: // stack >>> 165: long alignment = Math.max(layout.byteAlignment(), STACK_SLOT_SIZE); >> >> This is also missing part of the changes from: https://github.com/openjdk/panama-foreign/pull/728/ but other changes to the shared code are present. The `layout` parameter is not needed here. (see the changes to this file in the original PR) > > Actually, this patch is missing most of the stuff in PR 728. I was under the impression that, in order to fully support that, some VM changes were needed (e.g. to have better granularity in call shuffling - as per https://github.com/openjdk/panama-foreign/pull/699). As a result, this PR only contains changes to SharedUtil (to remove unused alignment functions) - but nothing else. 699 is not needed for this. 728 is a pure Java change that simply rejects layouts that don't have their natural alignment (so, it will rejects packed structs for instance, since the implementation doesn't support them on all platforms). All the other changes from 728 are here (most notably the code in AbstractLinker that checks the alignment), except the change that ignores the `layout` here and turns the code around the line above into an `assert`. The mac stack spilling patch requires 699 though (https://github.com/openjdk/panama-foreign/pull/746). I will put that in the PR with the VM changes. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From redestad at openjdk.org Mon Nov 7 13:58:14 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 7 Nov 2022 13:58:14 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v5] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <IP0PaG5h7VWGWdGZdXuovknduIS7WQ1hleNpz82aYsk=.2cb075a6-1b33-4790-8590-6279e2d58706@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: - Merge branch 'master' into 8282664-polyhash - Merge branch 'master' into 8282664-polyhash - Change scalar unroll to 2 element stride, minding dependency chain - Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI - Reorder loops and some other suggestions from @merykitty - ws - Add ArraysHashCode microbenchmarks - Fixed vector loops for int and char arrays - Split up Arrays/HashCode tests - Fixes, optimized short inputs, temporarily disabled vector loop for Arrays.hashCode cases, added and improved tests - ... and 33 more: https://git.openjdk.org/jdk/compare/d634ddef...95c10b5f ------------- Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=04 Stats: 1151 lines in 32 files changed: 1093 ins; 32 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From mcimadamore at openjdk.org Mon Nov 7 14:06:39 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 14:06:39 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v2] In-Reply-To: <b1ckc2kdUNotWjqf-qSLlZsKK-LrBe90YD5JEIw8M58=.61cda3cf-a617-4cc1-8875-97dda762dd4b@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IxHlukr_bx6t1miZypvnq_8eWAgyouVs1mdNOhFW3bE=.4efc799a-206e-4f05-a28f-fab819cd4334@github.com> <5pJEOFSd5uXPZ5W8_SC3dPizt8x83yjMPbtt2gmFwfA=.38b93d3f-13f3-4455-ac0a-33dbca8f44bd@github.com> <09WcJa8UMXNVYkW_eI7w-tW7fnZ6KXt1gt_4cu5Y2KE=.8be58947-1169-42e0-8c6f-45b66ff63c10@github.com> <b1ckc2kdUNotWjqf-qSLlZsKK-LrBe90YD5JEIw8M58=.61cda3cf-a617-4cc1-8875-97dda762dd4b@github.com> Message-ID: <PjTd9RxWhqKuuwBjjnZnYy2N2LzUablXp5QY1VWO7Vs=.35eb79fd-0d23-424a-a9fe-7fa3cdcf2697@github.com> On Mon, 7 Nov 2022 13:43:17 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Actually, this patch is missing most of the stuff in PR 728. I was under the impression that, in order to fully support that, some VM changes were needed (e.g. to have better granularity in call shuffling - as per https://github.com/openjdk/panama-foreign/pull/699). As a result, this PR only contains changes to SharedUtil (to remove unused alignment functions) - but nothing else. > > 699 is not needed for this. 728 is a pure Java change that simply rejects layouts that don't have their natural alignment (so, it will rejects packed structs for instance, since the implementation doesn't support them on all platforms). All the other changes from 728 are here (most notably the code in AbstractLinker that checks the alignment), except the change that ignores the `layout` here and turns the code around the line above into an `assert`. > > The mac stack spilling patch requires 699 though (https://github.com/openjdk/panama-foreign/pull/746). I will put that in the PR with the VM changes. Thanks for the clarification - I will incorporate those changes as well then. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Nov 7 14:17:40 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 14:17:40 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v5] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <nLGKXMEfHApyxseQFp882Q-QXNxbkWcChh2Yv1COKG4=.70ef8fea-423b-4034-ad6e-026f2164f3c4@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Bring windows CallArranger in sync with panama repo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/0c70da2c..b98febff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=03-04 Stats: 18 lines in 2 files changed: 0 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From redestad at openjdk.org Mon Nov 7 14:23:44 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 7 Nov 2022 14:23:44 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v6] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <erLsmKMYQvrNhI-iJ6L-gIkGuWQD6JrQhUOzgehBd-g=.73956645-086f-4efc-bee3-7160c8f64a68@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove UseSSE >= 3 precondition now that UseAVX > 0 implies UseSSE=4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/95c10b5f..cdf276de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Nov 7 14:25:21 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 7 Nov 2022 14:25:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> Message-ID: <9a0_6b99boFwuVxfQcswe4tQ3UYd7U1Yvv4EzePgH3o=.e8dcffc0-6bd5-4c56-909c-a21acda9a3de@github.com> On Tue, 25 Oct 2022 16:03:28 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > I did a quick write up explaining the approach at https://gist.github.com/luhenry/2fc408be6f906ef79aaf4115525b9d0c. Also, you can find details in @richardstartin's [blog post](https://richardstartin.github.io/posts/vectorised-polynomial-hash-codes) I've started working on an aarch64 port while @luhenry is working on a forward-iterating variant of his vector algorithm (based on @merykitty's suggestions). However, I'd request this PR be accepted as-is and do ports and such enhancements as follow-ups. That'd simplify work since we can continue work in parallel with less coordination and merges. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From yadongwang at openjdk.org Mon Nov 7 14:50:30 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Mon, 7 Nov 2022 14:50:30 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> Message-ID: <oY2bAmngcp-GLIZ5lNvhPD6s-itQDnbP-kWtvUfCePo=.c0de9fe6-a062-4ff2-a158-de96a1e8df4c@github.com> On Mon, 7 Nov 2022 07:26:41 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> src/hotspot/cpu/riscv/riscv.ad line 5202: >> >>> 5200: __ addi(t0, as_Register($mem$$base), $mem$$disp); >>> 5201: __ prefetch_w(t0, 0); >>> 5202: } >> >> It'd better to generate the case that handles the imm not fit for instructions like prefetch.w or addi. > > I'm not sure I understand what you're suggesting. I mean we should handle the situation that disp beyonds 12-bit range in addi or prefetch.w. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From mcimadamore at openjdk.org Mon Nov 7 15:00:02 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 15:00:02 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Make memory session a pure lifetime abstraction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/b98febff..f04be0da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=04-05 Stats: 2492 lines in 139 files changed: 600 ins; 771 del; 1121 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Nov 7 15:01:39 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 7 Nov 2022 15:01:39 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v5] In-Reply-To: <nLGKXMEfHApyxseQFp882Q-QXNxbkWcChh2Yv1COKG4=.70ef8fea-423b-4034-ad6e-026f2164f3c4@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <nLGKXMEfHApyxseQFp882Q-QXNxbkWcChh2Yv1COKG4=.70ef8fea-423b-4034-ad6e-026f2164f3c4@github.com> Message-ID: <7ZPsmtKqqOeItVnPztGyLhwuHi5Q9WsGI8SYzGkyL8Q=.33d63f63-9cbb-42bd-8d81-6555ed3d67d2@github.com> On Mon, 7 Nov 2022 14:17:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Bring windows CallArranger in sync with panama repo I have incorporated additional API changes, described in this document: http://cr.openjdk.java.net/~mcimadamore/panama/session_arenas.html The main change is that `MemorySession` is now a pure lifetime abstraction and no longer implements `AutoCloseable`/`SegementAllocator`. Instead a new abstraction, called `Arena` should be used for deterministic deallocation use cases. This change allows several simplifications on the `MemorySession` API, as there's no more need to support non-closeable views. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From stuefe at openjdk.org Mon Nov 7 15:35:55 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 7 Nov 2022 15:35:55 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <bEqxnLEmzJXZ1KNSpMSSoPlNA2wIIzeS285A2RNUfQE=.eacdea94-afc1-460f-bb44-bfedfc96a39b@github.com> On Mon, 7 Nov 2022 13:25:53 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA Hi Axel, I'm fine with the line break, but unsure about the STEP_IF. I don't think it adds much readability, but it is a wide-spread patch that will add unnecessary noise for us backporters. Just my opinion, lets see what others think. Cheers, Thomas src/hotspot/share/utilities/vmError.cpp line 593: > 591: controlled_crash(TestCrashInErrorHandler); > 592: return true; > 593: }()) I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11018 From redestad at openjdk.org Mon Nov 7 15:53:26 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 7 Nov 2022 15:53:26 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v7] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <V-iYMsyFlMR9wRlS6uU4BvjKdtdQX0kp_4ytcucB1FI=.c4edb62a-bad7-4bc3-af71-ef3b555b2af1@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with four additional commits since the last revision: - Merge pull request #1 from luhenry/dev/cl4es/8282664-polyhash Switch to forward approach for vectorization - Fix vector loop - fix indexing - Switch to forward approach for vectorization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/cdf276de..6f49b5aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=05-06 Stats: 241 lines in 4 files changed: 64 ins; 138 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Nov 7 16:08:31 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 7 Nov 2022 16:08:31 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> Message-ID: <sTcHG3qLfxv_CKBKuSc8uFidNWeL5gT6c6aSU7D47qo=.585659d0-b42e-4d00-9701-853f31487d9a@github.com> On Tue, 25 Oct 2022 16:03:28 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > I did a quick write up explaining the approach at https://gist.github.com/luhenry/2fc408be6f906ef79aaf4115525b9d0c. Also, you can find details in @richardstartin's [blog post](https://richardstartin.github.io/posts/vectorised-polynomial-hash-codes) Scratch that. I've merged in the forward-iterating vector loop changes @luhenry has worked on, which give a 1.33x speed-up and simplifies the vector loop a lot. Also moves the coefficient array to shared memory. Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1195.828 ? 10.956 ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 5 1197.123 ? 10.007 ns/op Some micro-optimizations for smaller arrays were disabled for this, but we'll work on getting that back in place before calling it a day. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From matsaave at openjdk.org Mon Nov 7 16:19:43 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 7 Nov 2022 16:19:43 GMT Subject: Integrated: 8295893: Improve printing of Constant Pool Cache Entries In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <A7Qk1pbLeXbus2-NZjTHOELsPMe63UadBukB8UCh9wQ=.c26347d6-0ec8-4b1b-ad61-ceab083d6a85@github.com> On Tue, 25 Oct 2022 19:37:12 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 <Concat0.s/Ljava/lang/String;> > 10 invokedynamic bsm=31 13 <makeConcatWithConstants(Ljava/lang/String;)Ljava/lang/String;> > BSM: REF_invokeStatic 32 <java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;> > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 <Concat0.d/Ljava/lang/String;> > 18 iinc #1 1 > 21 goto 2 > 24 return This pull request has now been integrated. Changeset: ba303c04 Author: Matias Saavedra Silva <matsaave at openjdk.org> Committer: Coleen Phillimore <coleenp at openjdk.org> URL: https://git.openjdk.org/jdk/commit/ba303c048eaabdf4ef3a891cc4bd232d69fc4631 Stats: 105 lines in 2 files changed: 83 ins; 2 del; 20 mod 8295893: Improve printing of Constant Pool Cache Entries Reviewed-by: dholmes, coleenp, iklam ------------- PR: https://git.openjdk.org/jdk/pull/10860 From aboldtch at openjdk.org Mon Nov 7 16:32:39 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Nov 2022 16:32:39 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <bEqxnLEmzJXZ1KNSpMSSoPlNA2wIIzeS285A2RNUfQE=.eacdea94-afc1-460f-bb44-bfedfc96a39b@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <bEqxnLEmzJXZ1KNSpMSSoPlNA2wIIzeS285A2RNUfQE=.eacdea94-afc1-460f-bb44-bfedfc96a39b@github.com> Message-ID: <XwV6rI2R4xqqtqX2hCql_RjuusRwVL6SsF-iH9JhKeQ=.bc57c2bf-00e4-4c0c-af4d-fa3cfe0c055b@github.com> On Mon, 7 Nov 2022 15:25:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Refactor the STEP macro in VMError::report to improve readability. >> Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. >> >> This enhancement aims to do two things: >> 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. >> 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro >> >> Testing: tier 1 + GHA > > src/hotspot/share/utilities/vmError.cpp line 593: > >> 591: controlled_crash(TestCrashInErrorHandler); >> 592: return true; >> 593: }()) > > I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. > I don't think it adds much readability, but it is a wide-spread patch that will add unnecessary noise for us backporters. Just my opinion, lets see what others think. I was worried about that too. This patch initially came about when I was writing #11017 which required rewriting the macro logic and add extra nested scope. The fact that the common case for a step was checking the verbose flag made me refactor out just the verbose case. And then I just refactored out the rest the few extra conditions as well. But I am pretty ambivalent about this change. #11017 contains the newline changes as well. So if there are more opinions that this makes back porting work harder it can be dropped. > I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. I added it because in my initial change (9449501f063e7662a530adf921ae53dadd0312a0), a theoretical crash in a condition would recover but not proceed to the next step. The test change is to avoid such a regression if someone thinks of doing the same simplification of the macro condition logic. (It would crash before `_current_step` was updated.) ------------- PR: https://git.openjdk.org/jdk/pull/11018 From coleenp at openjdk.org Mon Nov 7 17:14:09 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 17:14:09 GMT Subject: RFR: 8296472: Remove a JVMTI ObjectLocker Message-ID: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. Tested with tier1-4, and jvmti and jdi tests locally. ------------- Commit messages: - 8296472: Remove a JVMTI ObjectLocker Changes: https://git.openjdk.org/jdk/pull/11023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11023&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296472 Stats: 6 lines in 2 files changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11023.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11023/head:pull/11023 PR: https://git.openjdk.org/jdk/pull/11023 From alanb at openjdk.org Mon Nov 7 17:58:30 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 7 Nov 2022 17:58:30 GMT Subject: RFR: 8296472: Remove a JVMTI ObjectLocker In-Reply-To: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> Message-ID: <QDVAreN7KBmH3mUWgVlrBDXJ1UOkuT8qb3lFIjV8nLY=.f209fb49-ee8f-44a6-8145-c142ecb8676c@github.com> On Mon, 7 Nov 2022 17:07:01 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. > Tested with tier1-4, and jvmti and jdi tests locally. src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java line 204: > 202: * @see java.lang.instrument.Instrumentation#appendToSystemClassLoaderSearch > 203: */ > 204: synchronized void appendToClassPathForInstrumentation(String path) { We might not need this. appendClasspath is thread safe so it's okay for several agents calling appendToSystemClassLoaderSearch at around the same time. I don't think it needs to sycnrhonize with anything else. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From vlivanov at openjdk.org Mon Nov 7 18:06:41 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Nov 2022 18:06:41 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <ZUxveAxaS5YNw-Dt6LtOKkPe9Sya2c59PaBObFKKTIw=.f66f0a2c-c2c3-4382-96b6-cf1f47172349@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 989: > 987: bool multi_block); > 988: > 989: // ChaCha20-Poly1305 macroAssembler defs These methods can also be moved to `stubGenerator_x86_64.hpp`/`stubGenerator_x86_64_chacha.cpp`. There are no other usages besides x86-64-specific CC20 stub. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From vlivanov at openjdk.org Mon Nov 7 18:11:40 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Nov 2022 18:11:40 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <hcljJAg4vgyddG_TnF-Eb0_Va1-B7DoG1Wu9VkipsQI=.3a053e67-03e5-4fa3-a81f-71ab3be4e2ef@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. A note on code formatting: [HotSpot coding style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md) mentions "Indentation levels are two columns", but some portions of the newly added code use 4 column indentation. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From coleenp at openjdk.org Mon Nov 7 18:16:49 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 18:16:49 GMT Subject: RFR: 8296472: Remove a JVMTI ObjectLocker In-Reply-To: <QDVAreN7KBmH3mUWgVlrBDXJ1UOkuT8qb3lFIjV8nLY=.f209fb49-ee8f-44a6-8145-c142ecb8676c@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <QDVAreN7KBmH3mUWgVlrBDXJ1UOkuT8qb3lFIjV8nLY=.f209fb49-ee8f-44a6-8145-c142ecb8676c@github.com> Message-ID: <sJnsLBMHB5OMv9q3-aSnCzx6I9JNTN5F1MkER2MZdfM=.17d54669-0ee7-4e89-8b42-bbbe375147dd@github.com> On Mon, 7 Nov 2022 17:56:00 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java line 204: > >> 202: * @see java.lang.instrument.Instrumentation#appendToSystemClassLoaderSearch >> 203: */ >> 204: synchronized void appendToClassPathForInstrumentation(String path) { > > We might not need this. appendClasspath is thread safe so it's okay for several agents calling appendToSystemClassLoaderSearch at around the same time. I don't think it needs to sycnrhonize with anything else. I traced it down to this: Is this why it's already synchronized ? appendClassPath -> ucp.addFile -> addURL public synchronized void addURL(URL url) { if (closed || url == null) return; synchronized (unopenedUrls) { if (! path.contains(url)) { unopenedUrls.addLast(url); path.add(url); } } } ------------- PR: https://git.openjdk.org/jdk/pull/11023 From alanb at openjdk.org Mon Nov 7 18:36:33 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 7 Nov 2022 18:36:33 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call In-Reply-To: <sJnsLBMHB5OMv9q3-aSnCzx6I9JNTN5F1MkER2MZdfM=.17d54669-0ee7-4e89-8b42-bbbe375147dd@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <QDVAreN7KBmH3mUWgVlrBDXJ1UOkuT8qb3lFIjV8nLY=.f209fb49-ee8f-44a6-8145-c142ecb8676c@github.com> <sJnsLBMHB5OMv9q3-aSnCzx6I9JNTN5F1MkER2MZdfM=.17d54669-0ee7-4e89-8b42-bbbe375147dd@github.com> Message-ID: <m63k-bjMSlEYStE-1OtwBx4ZlVFjryvRBCyEGJUkAEo=.ff58876c-84c3-477c-8232-02345ca5db16@github.com> On Mon, 7 Nov 2022 18:11:29 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java line 204: >> >>> 202: * @see java.lang.instrument.Instrumentation#appendToSystemClassLoaderSearch >>> 203: */ >>> 204: synchronized void appendToClassPathForInstrumentation(String path) { >> >> We might not need this. appendClasspath is thread safe so it's okay for several agents calling appendToSystemClassLoaderSearch at around the same time. I don't think it needs to sycnrhonize with anything else. > > I traced it down to this: Is this why it's already synchronized ? > appendClassPath -> ucp.addFile -> addURL > public synchronized void addURL(URL url) { > if (closed || url == null) > return; > synchronized (unopenedUrls) { > if (! path.contains(url)) { > unopenedUrls.addLast(url); > path.add(url); > } > } > } Yes, URLClassPath is already synchronized. I'm not 100% sure why the ObjectLocker is there but it's possible to pre-dates parallel capable class loaders so it dates from a time when findClass/loadClass were synchronized on "this". ------------- PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Mon Nov 7 18:36:33 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 18:36:33 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call In-Reply-To: <m63k-bjMSlEYStE-1OtwBx4ZlVFjryvRBCyEGJUkAEo=.ff58876c-84c3-477c-8232-02345ca5db16@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <QDVAreN7KBmH3mUWgVlrBDXJ1UOkuT8qb3lFIjV8nLY=.f209fb49-ee8f-44a6-8145-c142ecb8676c@github.com> <sJnsLBMHB5OMv9q3-aSnCzx6I9JNTN5F1MkER2MZdfM=.17d54669-0ee7-4e89-8b42-bbbe375147dd@github.com> <m63k-bjMSlEYStE-1OtwBx4ZlVFjryvRBCyEGJUkAEo=.ff58876c-84c3-477c-8232-02345ca5db16@github.com> Message-ID: <IJP6vEWD0h7ok3bmtAY6YP1eT-JOexv6dgqQ7rRRfkM=.1b69cdbd-25eb-4d1b-a59f-d8221e54d132@github.com> On Mon, 7 Nov 2022 18:29:56 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> I traced it down to this: Is this why it's already synchronized ? >> appendClassPath -> ucp.addFile -> addURL >> public synchronized void addURL(URL url) { >> if (closed || url == null) >> return; >> synchronized (unopenedUrls) { >> if (! path.contains(url)) { >> unopenedUrls.addLast(url); >> path.add(url); >> } >> } >> } > > Yes, URLClassPath is already synchronized. I'm not 100% sure why the ObjectLocker is there but it's possible to pre-dates parallel capable class loaders so it dates from a time when findClass/loadClass were synchronized on "this". Yes, the code's been there for forever. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/11023 From vlivanov at openjdk.org Mon Nov 7 18:50:37 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Nov 2022 18:50:37 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <phhl3Y4SAoIsaUUS8zB3DRihsyXboH4E19BGSeyFBGI=.22ab4805-217e-4de4-be8d-8821856885c6@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 870: > 868: */ > 869: @IntrinsicCandidate > 870: private static int _chaCha20Block(int[] initState, byte[] result) { Seems like there are 2 major naming conventions for intrinsic helper methods: prepend "impl" (e.g, `CounterMode.implCrypt`) or append "0" (`GaloisCounterMode.implGCMCrypt0`). I'd prefer to see either one used here. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From vlivanov at openjdk.org Mon Nov 7 19:02:42 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Nov 2022 19:02:42 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <i3HEiQ4tE7VHz-GczyoXN2rSnqIY3LZ6GwiKdIQevBg=.f6deb4c7-244a-4467-b7fb-8402d72e62e5@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) > > x86_64 > Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s > ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s > ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s > ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s > ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s > ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s > ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s > ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s > > Intrinsics enabled (-XX:UseAVX=1) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s > ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s > ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s > ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s > > Intrinsics enabled (-XX:UseAVX=2) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s > ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s > ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s > ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s > > Intrinsics enabled (-XX:UseAVX=3) > --------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s > ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s > ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s > ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s > ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s > ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s > ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s > ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s > > aarch64 > Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, > part : 0xd0c, revision : 1 > > Java only (-XX:-UseChaCha20Intrinsics) > -------------------------------------- > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s > ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s > ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s > ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s > ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s > ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s > ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s > ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s > > Intrinsics enabled > ------------------ > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s > ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s > ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s > ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s > ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s > ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s > ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s > ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s > > ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s > ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s > ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s > ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s > ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s > ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s > ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s > ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. On AVX512 vs AVX1/2 discussion: it makes sense to consider doing runtime dispatching between AVX512 and AVX2 stubs depending on input size. As our previous experience shows (e.g., `AVX3Threshold`), running AVX512 code induces some overhead (varies between microarchitectures) which in some cases defeats the purpose of using AVX512. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From coleenp at openjdk.org Mon Nov 7 19:04:36 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 19:04:36 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v2] In-Reply-To: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> Message-ID: <vfJLcnVI-70yd29yzL9txMyC71hB6-OAHwMeG8F9YfI=.79f9451f-ee42-4fd5-ba85-bac43c9ae4b3@github.com> > This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. > Tested with tier1-4, and jvmti and jdi tests locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Revert ClassLoaders.java change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11023/files - new: https://git.openjdk.org/jdk/pull/11023/files/0c0e1654..781b44fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11023&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11023&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11023.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11023/head:pull/11023 PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Mon Nov 7 19:04:37 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 19:04:37 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call In-Reply-To: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> Message-ID: <eG5MUdY6OW8_PWIeGLtaQXloud_lOoK6qghnMsqO73c=.4dd5e8ee-92e9-473f-8719-c01e9dc47876@github.com> On Mon, 7 Nov 2022 17:07:01 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. > Tested with tier1-4, and jvmti and jdi tests locally. I reran jvmti tests on the change to revert ClassLoaders.java. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From psandoz at openjdk.org Mon Nov 7 19:11:15 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 19:11:15 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <6pOPurF5NgGOYP73lLadj7jzY6FVrCUl9Fkh7MkBfi0=.7ecd3ff9-a2ac-474f-8270-a30bb0f56c92@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/Arena.java line 37: > 35: * This session is created with the arena, and is closed when the arena is {@linkplain #close() closed}. > 36: * Furthermore, all the native segments {@linkplain #allocate(long, long) allocated} by the arena are associated > 37: * with that session. I think we can simplify the wording by saying an arena has a session: Suggestion: * An arena is a {@linkplain AutoCloseable closeable} segment allocator that has a {@link #session() memory session}. * The arena's session is created when the arena is created, and is closed when the arena is {@linkplain #close() closed}. * All native segments {@linkplain #allocate(long, long) allocated} by the arena are associated * with its session. src/java.base/share/classes/java/lang/foreign/Arena.java line 65: > 63: * The {@link MemorySegment#address()} of the returned memory segment is the starting address of the > 64: * allocated off-heap memory region backing the segment. Moreover, the {@linkplain MemorySegment#address() address} > 65: * of the returned segment is aligned according the provided alignment constraint. Suggestion: /** * Creates a native memory segment with the given size (in bytes) and alignment constraint (in bytes). * The returned segment is associated with the arena's memory session. * The segment's {@link MemorySegment#address() address} is the starting address of the * allocated off-heap memory region backing the segment, and the address is * aligned according the provided alignment constraint. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From psandoz at openjdk.org Mon Nov 7 19:23:28 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 19:23:28 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <m-DUj0-tpTQ8gspRH6qB27A5esSvI12S30QhZi2YOgs=.2edbc994-7ae2-4aeb-bf56-dd0728df0adf@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 357: > 355: > 356: /** > 357: * Creates an access var handle that can be used to access a memory segment at the layout selected by the given layout path, Suggestion: * Creates a var handle that can be used to access a memory segment at the layout selected by the given layout path, ------------- PR: https://git.openjdk.org/jdk/pull/10872 From psandoz at openjdk.org Mon Nov 7 19:34:26 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 19:34:26 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <_3OEoupwOnEbpnA1ApYUiH3GhzDlHyOxearctDT85a0=.493601cf-85d0-4a5d-959d-d01da03e4e83@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 104: > 102: * Every memory segment is associated with a {@linkplain MemorySession memory session}. This ensures that access operations > 103: * on a memory segment cannot occur when the region of memory which backs the memory segment is no longer available > 104: * (e.g. after the memory session associated with the accessed memory segment is no longer {@linkplain MemorySession#isAlive() alive}. Missing close brace: Suggestion: * (e.g., after the memory session associated with the accessed memory segment is no longer {@linkplain MemorySession#isAlive() alive}). ------------- PR: https://git.openjdk.org/jdk/pull/10872 From alanb at openjdk.org Mon Nov 7 19:46:47 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 7 Nov 2022 19:46:47 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call In-Reply-To: <eG5MUdY6OW8_PWIeGLtaQXloud_lOoK6qghnMsqO73c=.4dd5e8ee-92e9-473f-8719-c01e9dc47876@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <eG5MUdY6OW8_PWIeGLtaQXloud_lOoK6qghnMsqO73c=.4dd5e8ee-92e9-473f-8719-c01e9dc47876@github.com> Message-ID: <b4zSxq2V_ilH1vtdqO-prkDKHFWD91Rag5ILpKyEXU8=.c14e3558-d9a8-46c9-82ec-cdc988309346@github.com> On Mon, 7 Nov 2022 19:00:40 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > I reran jvmti tests on the change to revert ClassLoaders.java. You can revert the comment too because it would be confusing to say that it locks the class loader when it doesn't. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From psandoz at openjdk.org Mon Nov 7 19:47:42 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 19:47:42 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <4Sj58ZiAdyXGqIagMDMMD3dTWwjKt8pTudl1JHnwp4Q=.cf41a458-e19d-42b7-b465-b3c40db144ac@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 312: > 310: * </table></blockquote> > 311: * > 312: * Heap segment can only be accessed using a layout whose alignment is smaller or equal to the Suggestion: * Heap segments can only be accessed using a layout whose alignment is smaller or equal to the ------------- PR: https://git.openjdk.org/jdk/pull/10872 From duke at openjdk.org Mon Nov 7 19:58:29 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 7 Nov 2022 19:58:29 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> Message-ID: <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> On Fri, 4 Nov 2022 05:14:50 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. >> >> In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. >> When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. >> When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. >> >> This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. >> This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. > > I am not sure if the existing implementation is 100% correct, but for these test cases, I think we are probably saved by this code: > > > if (!is_aligned(relocated_closed_heap_region_bottom, HeapRegion::GrainBytes)) { > // Align the bottom of the closed archive heap regions at G1 region boundary. > // This will avoid the situation where the highest open region and the lowest > // closed region sharing the same G1 region. Otherwise we will fail to map the > // open regions. > size_t align = size_t(relocated_closed_heap_region_bottom) % HeapRegion::GrainBytes; > delta -= align; > log_info(cds)("CDS heap data needs to be relocated lower by a further " SIZE_FORMAT > " bytes to " INTX_FORMAT " to be aligned with HeapRegion::GrainBytes", > align, delta); > set_shared_heap_runtime_delta(delta); > relocated_closed_heap_region_bottom = heap_region_runtime_start_address(si); > _heap_pointers_need_patching = true; > } > > > G1 regions are at least 1MB, and are always a power of 2. > > By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail. > > > - TestCommon.concat(vmOptionsPrefix, "HelloString")); > + TestCommon.concat(vmOptionsPrefix, "-Xlog:cds=debug", "-Xmx6g", "HelloString")); > > > In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors. > > (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs). > > Currently, when writing the archived heap, we allocate a G1 region and write objects into it, from bottom to top. When it fills up, we allocate another G1 region that's immediately below, and start filing it from bottom to top. At the end, we merge all the fully-filled regions into the CA0 region, and make the last, half-filled region CA1. > > (Same for the OA0, OA1 regions, but usually the OA0 region never has more than 1MB objects, so we'd never have the OA1 region). > > This is kind of kludgy. We should be able to first determine all objects to be archived, and then write them out a single contiguous "closed" region, and a single contiguous "open" region. When filling out these regions, we can pack the objects so that they will never cross a 1MB boundary. > > Also, I think it may not even be worthwhile to have the "closed" region and treat it specially at runtime. We can have just a single contiguous block of archived objects like this, where S are the String objects and their char arrays, and O are the other types of objects > > > OOOOOOOOOOOSSSSSSSSSSS > > > At runtime, we allocate enough G1 regions from the top of the heap to accommodate the archived objects, and put a dummy object at the bottom to fix the bottom-most region. > > (The reason we align the archived regions to the top of the G1 heap is the top of the heap usually have the same narrowOop for various heap sizes, so we can usually avoid patching the embedded oop pointers. > > This is a trade off with other collectors, which may not allow you to start allocating memory from the top. We may want to reconsider this.) > > All the Strings are always in the interned table so they will never be collected. Also, we already computed their hashcode, so they are never written into (unless you `synchronize` on them at runtime). So for the region(s) that contain only the S objects, > we can effectively share the memory across multiple processes, and the GC will never collect them. > > Anyway, we usually just have a few MBs of archived objects, so it may not matter whether we keep them immutable or not. > ******* > > I want to thank you for starting working in this area. Going forward, I think we need more discussion and design before we can decide exactly what to do. @iklam thanks for sharing the information and details on the future work in this space. > By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail. I was actually referring to CA0 and CA1 in my figures (which I realized was not clear in my explanation earlier). Anyway, I now understand the existing mechanism works fine because the following conditions are maintained (which you have already mentioned in your comment): 1. G1 regions are at least 1MB, and are always a power of 2. 2. At dump time the objects are placed such that they do not cross `HeapRegion::min_region_size_in_words()` which I believe is 1M. Because of these two constraints, change in G1 region size at run time cannot result in objects crossing the region boundary. So if I update the G1 code such that at run time the regions are mapped at 1M boundary then I can get rid of the problem of objects crossing region boundary and the two tests also pass. > In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors. I agree ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) would make it easier to map them at run time and would be happy to contribute to it anyway possible. But again, that's a GC policy specific implementation detail. I guess you would agree we need to de-couple the CDS code from the GC policy details. While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). So, with this in mind do you think we can continue working on this PR, or do you believe the GC APIs this PR proposes to add would not be sufficient once JDK-8296344 is implemented? > (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs). That makes sense. Ideally I should have done the implementation for other collectors in a separate RFEs. But I was worried if I the new APIs are flexible enough to support other non-G1 policies, and in an attempt to verify that I added the support for those policies as well. If it helps I can remove those commits and deliver them later in subsequent RFEs. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From psandoz at openjdk.org Mon Nov 7 20:00:39 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 20:00:39 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <MgFjJBRGvsdwa5wwa7hjMfpsbnWqcb0hYtQjXCgqlPs=.ad91f74c-c5c6-4ceb-aca2-ceabc5c3ca92@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/MemorySession.java line 83: > 81: * MemorySegment segment = MemorySegment.allocateNative(100, MemorySession.implicit()); > 82: * ... > 83: * segment = null; // the segment session is unreacheable here and becomes available for implicit close Typo: Suggestion: * segment = null; // the segment session is unreachable here and becomes available for implicit close ------------- PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Mon Nov 7 20:09:27 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 7 Nov 2022 20:09:27 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v5] In-Reply-To: <7ZPsmtKqqOeItVnPztGyLhwuHi5Q9WsGI8SYzGkyL8Q=.33d63f63-9cbb-42bd-8d81-6555ed3d67d2@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <nLGKXMEfHApyxseQFp882Q-QXNxbkWcChh2Yv1COKG4=.70ef8fea-423b-4034-ad6e-026f2164f3c4@github.com> <7ZPsmtKqqOeItVnPztGyLhwuHi5Q9WsGI8SYzGkyL8Q=.33d63f63-9cbb-42bd-8d81-6555ed3d67d2@github.com> Message-ID: <dPN_w9M_H0fIYJxhMN_-wbSWaxh3BeKsczT6N589Eeo=.36df4534-d66a-4b48-a9af-cd6cd1627f9c@github.com> On Mon, 7 Nov 2022 14:59:27 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Bring windows CallArranger in sync with panama repo > > I have incorporated additional API changes, described in this document: > http://cr.openjdk.java.net/~mcimadamore/panama/session_arenas.html > > The main change is that `MemorySession` is now a pure lifetime abstraction and no longer implements `AutoCloseable`/`SegementAllocator`. Instead a new abstraction, called `Arena` should be used for deterministic deallocation use cases. This change allows several simplifications on the `MemorySession` API, as there's no more need to support non-closeable views. @mcimadamore looks like your latest merge also undid the changes from you `b98febf` commit again ------------- PR: https://git.openjdk.org/jdk/pull/10872 From coleenp at openjdk.org Mon Nov 7 20:40:33 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 20:40:33 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> Message-ID: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> > This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. > Tested with tier1-4, and jvmti and jdi tests locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: really revert the file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11023/files - new: https://git.openjdk.org/jdk/pull/11023/files/781b44fc..66e9133b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11023&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11023&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11023.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11023/head:pull/11023 PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Mon Nov 7 20:40:34 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 7 Nov 2022 20:40:34 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v2] In-Reply-To: <vfJLcnVI-70yd29yzL9txMyC71hB6-OAHwMeG8F9YfI=.79f9451f-ee42-4fd5-ba85-bac43c9ae4b3@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <vfJLcnVI-70yd29yzL9txMyC71hB6-OAHwMeG8F9YfI=.79f9451f-ee42-4fd5-ba85-bac43c9ae4b3@github.com> Message-ID: <lAE1eD-JK4g_4b6UfdsiNSS4TpxJBhUjYBSGWLFQJxM=.c91a8644-b620-4b72-b2cb-c54b3ed032c2@github.com> On Mon, 7 Nov 2022 19:04:36 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Revert ClassLoaders.java change Thanks for spotting that Alan, I thought I'd reverted the entire file. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From psandoz at openjdk.org Mon Nov 7 20:42:16 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 20:42:16 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <s0VwUTJOI32CbUYWoy1AYtkQbnVZSiv_Uym4vhRpOxU=.19a6337a-bd1f-4b76-b410-a7a00af9318a@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/ValueLayout.java line 329: > 327: /** > 328: * Returns an <em>unbounded</em> address layout with the same carrier, alignment constraint, name and order as this address layout, > 329: * but with the specified pointee layout. An unbounded address layouts allow raw addresses to be accessed Suggestion: * but with the specified pointee layout. An unbounded address layout allow raw addresses to be accessed ------------- PR: https://git.openjdk.org/jdk/pull/10872 From psandoz at openjdk.org Mon Nov 7 20:46:37 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 7 Nov 2022 20:46:37 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <aCCxXd7YNUNb9EATYXyOIr-yjkqpwtGU86TG7l-YTqM=.4af1d051-af69-4116-a760-57948ff9424e@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/foreign/package-info.java line 103: > 101: * the memory session associated with the segment being accessed has not been closed prematurely. > 102: * We call this guarantee <em>temporal safety</em>. Together, spatial and temporal safety ensure that each memory access > 103: * operation either succeeds - and accesses a valid location of the region of memory backing the memory segment - or fails. Suggestion: * operation either succeeds - and accesses a valid location within the region of memory backing the memory segment - or fails. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From dholmes at openjdk.org Mon Nov 7 21:05:27 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Nov 2022 21:05:27 GMT Subject: RFR: 8296401: ConcurrentHashTable::bulk_delete might miss to delete some objects In-Reply-To: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> References: <rWw3Hb1baG5eslMgFnOdEP1Q4avhfYc3uTpjdB9DaP0=.23f34ff0-eff9-421b-9e70-7f119fd90e5d@github.com> Message-ID: <srzYQG8vQxXtFu8IYd4LzHoFZo6VR8_D_9e5wlek0u4=.c3fb4829-6de5-428e-9b70-f34aeba9c55f@github.com> On Fri, 4 Nov 2022 13:38:23 GMT, Leo Korinth <lkorinth at openjdk.org> wrote: > ConcurrentHashTable::bulk_delete might miss to delete some objects if a bucket has more than 256 entries. Current uses of ConcurrentHashTable are not harmed by this behaviour. > > I modified gtest:ConcurrentHashTable to detect the problem (first commit), and fixed the problem in the code (second commit). > > Tests passes tier1-3. Thanks for the explanation. What is the point of having a BULK_DELETE_LIMIT? ------------- PR: https://git.openjdk.org/jdk/pull/10983 From iklam at openjdk.org Mon Nov 7 21:23:36 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 7 Nov 2022 21:23:36 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> Message-ID: <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> On Mon, 7 Nov 2022 19:56:09 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). I think it depends on how we want to change the dump time operations. If we decide to go with a single contiguous block, then the API for mapping this block into the runtime heap will look very different than what you have today: bool ArchiveHeapLoader::get_heap_range_for_archive_regions(ArchiveHeapRegions* heap_regions, bool is_open) { if (Universe::heap()->alloc_archive_regions(heap_regions->dumptime_regions(), heap_regions->num_regions(), heap_regions->runtime_regions(), is_open)) { Also, we should probably record the region boundary information in the archived objects. Something like "objects never span across 1MB boundaries". This may need to be passed to the runtime mapping API, so incompatible collectors (i.e., one uses 512KB regions) can reject the archived objects. One of my goal for JDK-8296344 is to optimize the archived objects for the collector chosen at dump time. For example, if you dump with SerialGC, the archived objects can be mapped efficiently without relocation when SerialGC is also chosen at run time, but may require relocation if G1 is chosen at run time. I am not sure if how this would affect the runtime mapping API. Maybe some sort of preference would need to be indicated. I think it would be best for us to think about the whole picture before committing to a design. Timing wise, I think we missed the JDK 20 release anyway, so we should have plenty time to come up with a good design for JDK 21. I also would like to hear from folks in our GC team. @tschatzl @stefank ------------- PR: https://git.openjdk.org/jdk/pull/10970 From luhenry at openjdk.org Mon Nov 7 21:50:41 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 7 Nov 2022 21:50:41 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <oY2bAmngcp-GLIZ5lNvhPD6s-itQDnbP-kWtvUfCePo=.c0de9fe6-a062-4ff2-a158-de96a1e8df4c@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> <oY2bAmngcp-GLIZ5lNvhPD6s-itQDnbP-kWtvUfCePo=.c0de9fe6-a062-4ff2-a158-de96a1e8df4c@github.com> Message-ID: <TFF7iTSJjPdkuF35PXZ9o3ZZ6UHxEL-FzTsruwAfrRA=.7e3cdeee-9e02-4695-9759-f0e31b5c0b14@github.com> On Mon, 7 Nov 2022 14:46:08 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> I'm not sure I understand what you're suggesting. > > I mean we should handle the situation that disp beyonds 12-bit range in addi or prefetch.w. How about we add an assert for now? That `disp` value is driven by `AllocatePrefetchDistance`, `AllocatePrefetchStepSize`, and `AllocatePrefetchLines`/`AllocateInstancePrefetchLines`. Given the default value of these config variables, the maximum `disp` would be `3*CacheLineSize + max(3,1) * CacheLineSize` which is (on most platforms) 384 bytes. If you max-out the number of lines to prefetch, you'd get `3*CacheLineSize + 64 * CacheLineSize` or 4288 bytes which indeed requires more than 12 bit of storage. Either we can safeguard against that in `vm_version_riscv.cpp` (given these values are quite nonsensical in themselves, given the cost/benefit of doing so many prefetches or so far). Either we can complexify the code generation here in that case. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From sspitsyn at openjdk.org Mon Nov 7 22:51:22 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 7 Nov 2022 22:51:22 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <TXQZE_h6l-EnaBYOQWsEfhPNu50aXQYjzv6YSRU72jw=.3a20f09f-ba1a-4c63-9f46-4dd35584cb21@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file > We might not need this. appendClasspath is thread safe so it's okay for several > agents calling appendToSystemClassLoaderSearch at around the same time. > I don't think it needs to sycnrhonize with anything else. I'm thinking about a good place where to place a comment about this. Probably, it can be placed before the method `appendToClassPathForInstrumentation`. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Tue Nov 8 01:05:24 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 01:05:24 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call Message-ID: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. Tested with tier 1-4. ------------- Commit messages: - 8296492: Remove ObjectLocker in JVMTI get_subgroups call Changes: https://git.openjdk.org/jdk/pull/11033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296492 Stats: 109 lines in 5 files changed: 30 ins; 67 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From fjiang at openjdk.org Tue Nov 8 01:06:23 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 8 Nov 2022 01:06:23 GMT Subject: RFR: 8296435: RISC-V: Small refactoring for increment/decrement In-Reply-To: <nyVf2hIFJ2QnB0XiU2ZhbAuvoG_hF2AKdB98Ze2Ehrc=.2fd1aa14-f007-4dc0-b9ce-7c091c84bf34@github.com> References: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> <nyVf2hIFJ2QnB0XiU2ZhbAuvoG_hF2AKdB98Ze2Ehrc=.2fd1aa14-f007-4dc0-b9ce-7c091c84bf34@github.com> Message-ID: <h_TRo_Kh3fLmwl2de4pfIS9xAVgkz5UbcH5ofUONKgU=.6c8f88d1-395e-420f-95ca-51b14488fabe@github.com> On Mon, 7 Nov 2022 01:45:36 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The `increment` and `decrement` use t1 as tmp register, while t1 was the flag register in c2. >> We can make tmp registers as the arguments of `increment` and `decrement` so that c2 can reuse them. >> >> Testing: >> >> full tier1 tests passed on Linux-riscv64 HiFive Unmathced board with release build > > Looks good. Thanks. @RealFYang -- Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/11005 From fjiang at openjdk.org Tue Nov 8 01:16:05 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 8 Nov 2022 01:16:05 GMT Subject: Integrated: 8296435: RISC-V: Small refactoring for increment/decrement In-Reply-To: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> References: <YkfeG7lrC_tZlPiWq7J7oyWM2o8xnp6i-SGR7mKVyO8=.ef2bbd2d-844d-4ac9-9d70-a35c4a6dd494@github.com> Message-ID: <-lpF94IswrhxOdvAi3A-oA4b86DStbVswuDy29Fkmxk=.ce861d48-123e-4253-b54d-bac2317f4139@github.com> On Sun, 6 Nov 2022 01:42:50 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > The `increment` and `decrement` use t1 as tmp register, while t1 was the flag register in c2. > We can make tmp registers as the arguments of `increment` and `decrement` so that c2 can reuse them. > > Testing: > > full tier1 tests passed on Linux-riscv64 HiFive Unmathced board with release build This pull request has now been integrated. Changeset: 4c80dff2 Author: Feilong Jiang <fjiang at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/4c80dff2cab8bc0fcfeca8d21754a28e31e92325 Stats: 42 lines in 3 files changed: 0 ins; 6 del; 36 mod 8296435: RISC-V: Small refactoring for increment/decrement Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/11005 From dzhang at openjdk.org Tue Nov 8 02:35:13 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 8 Nov 2022 02:35:13 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec [v3] In-Reply-To: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> Message-ID: <b5uJ61i8Vv0hzg4GOyY36NyYy8md5g4GBW25lvwU_1M=.b62317ef-30ac-409f-a8a7-d602ad4931ed@github.com> > Hi, > > At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. > > `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. > > `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. > > I think it is better to adjust the operands order of these two instructions to be consistent with the spec. > > Please take a look and have some reviews. Thanks a lot. > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Fix operands of the macro patch_VArith ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11009/files - new: https://git.openjdk.org/jdk/pull/11009/files/0930a669..adeb8e21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11009&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11009&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11009/head:pull/11009 PR: https://git.openjdk.org/jdk/pull/11009 From fyang at openjdk.org Tue Nov 8 02:35:13 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Nov 2022 02:35:13 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec [v3] In-Reply-To: <b5uJ61i8Vv0hzg4GOyY36NyYy8md5g4GBW25lvwU_1M=.b62317ef-30ac-409f-a8a7-d602ad4931ed@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> <b5uJ61i8Vv0hzg4GOyY36NyYy8md5g4GBW25lvwU_1M=.b62317ef-30ac-409f-a8a7-d602ad4931ed@github.com> Message-ID: <c8V7CZm12cjAttaTsb-ECwveY9wqbF3OIAJnsOQ5FPk=.ff10f92a-10cd-40e7-81be-ff67374629b7@github.com> On Tue, 8 Nov 2022 02:31:16 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: >> Hi, >> >> At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. >> >> `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. >> >> `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. >> >> I think it is better to adjust the operands order of these two instructions to be consistent with the spec. >> >> Please take a look and have some reviews. Thanks a lot. >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding >> >> ## Testing: >> >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu >> - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Fix operands of the macro patch_VArith Looks fine. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11009 From dzhang at openjdk.org Tue Nov 8 02:48:36 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 8 Nov 2022 02:48:36 GMT Subject: RFR: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec [v2] In-Reply-To: <GPMDsrypVIG1tygjT_C_RLif14n9K77AT_wBN5mYAso=.8f8dca7c-6599-4638-8f6d-13761efb585b@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> <I7ynViT1NGZhHBBGb37Twqt9grDcUDGB9eA9aOdMe38=.bfbd99ec-2134-4b0c-9f96-a50beda79dde@github.com> <GPMDsrypVIG1tygjT_C_RLif14n9K77AT_wBN5mYAso=.8f8dca7c-6599-4638-8f6d-13761efb585b@github.com> Message-ID: <H8b-oVyMyEElnl9Gdk9DHUySdfagK7O89qpUP20Jx48=.6552c3b9-2991-4334-bf69-bad751b7a3de@github.com> On Mon, 7 Nov 2022 07:37:06 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate macro definition > > I also verified it matches with https://github.com/riscv/riscv-opcodes/blob/master/rv_v @luhenry @RealFYang Thanks for the review! ------------- PR: https://git.openjdk.org/jdk/pull/11009 From yadongwang at openjdk.org Tue Nov 8 02:50:37 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 8 Nov 2022 02:50:37 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <TFF7iTSJjPdkuF35PXZ9o3ZZ6UHxEL-FzTsruwAfrRA=.7e3cdeee-9e02-4695-9759-f0e31b5c0b14@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> <oY2bAmngcp-GLIZ5lNvhPD6s-itQDnbP-kWtvUfCePo=.c0de9fe6-a062-4ff2-a158-de96a1e8df4c@github.com> <TFF7iTSJjPdkuF35PXZ9o3ZZ6UHxEL-FzTsruwAfrRA=.7e3cdeee-9e02-4695-9759-f0e31b5c0b14@github.com> Message-ID: <GLEozuL5Ipj_a0Zoyuj6cj0lGi1nyZwStWRPSQVPu3o=.4afc8b31-bfbf-4fc0-9add-09ef7e844955@github.com> On Mon, 7 Nov 2022 21:48:38 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> I mean we should handle the situation that disp beyonds 12-bit range in addi or prefetch.w. > > How about we add an assert for now? That `disp` value is driven by `AllocatePrefetchDistance`, `AllocatePrefetchStepSize`, and `AllocatePrefetchLines`/`AllocateInstancePrefetchLines`. Given the default value of these config variables, the maximum `disp` would be `3*CacheLineSize + max(3,1) * CacheLineSize` which is (on most platforms) 384 bytes. If you max-out the number of lines to prefetch, you'd get `3*CacheLineSize + 64 * CacheLineSize` or 4288 bytes which indeed requires more than 12 bit of storage. > > Either we can safeguard against that in `vm_version_riscv.cpp` (given these values are quite nonsensical in themselves, given the cost/benefit of doing so many prefetches or so far). Either we can complexify the code generation here in that case. I'm inclined to support the second option. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From dzhang at openjdk.org Tue Nov 8 02:51:42 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 8 Nov 2022 02:51:42 GMT Subject: Integrated: 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec In-Reply-To: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> References: <arChiUBGra90rJKJdaDfUOpyTPRThqevpkrAPRja3cM=.e800c3f5-8bcc-4770-9c56-b9ec955bb760@github.com> Message-ID: <XA0wKZ1COKcU_lmXyEyYvF78pmhkv-mA8GTVM01twKc=.24108d82-b82e-4cbd-8a07-0f424a7f2bcf@github.com> On Mon, 7 Nov 2022 01:59:25 GMT, Dingli Zhang <dzhang at openjdk.org> wrote: > Hi, > > At the moment, the operands order of `vrsub_vx` and ` vrsub_vi` is not the same as in the RVV1.0 spec[1]. These instructions use the wrong assembly syntax pattern for vector binary arithmetic instructions (multiply-add)[2]. > > `vrsub_vx` was classified as `Vector Single-Width Integer Add and Subtract` in rvv1.0 spec, but is currently classified as `Vector Single-Width Integer Multiply-Add Instructions` and generate the functions under the corresponding macros, which results in the reverse order of the operands `Vs2` and `Rs1` compared to the spec. > > `vrsub_vi` has its own separate macro definition to generate the corresponding function and the order of these operands(`Vs2` and `imm`) is reversed too. > > I think it is better to adjust the operands order of these two instructions to be consistent with the spec. > > Please take a look and have some reviews. Thanks a lot. > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#101-vector-arithmetic-instruction-encoding > > ## Testing: > > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu > - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu This pull request has now been integrated. Changeset: 1169dc06 Author: Dingli Zhang <dzhang at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/1169dc066c0257da1a237960b8c0cc4782ef8d14 Stats: 21 lines in 3 files changed: 1 ins; 11 del; 9 mod 8296447: RISC-V: Make the operands order of vrsub_vx/vrsub_vi consistent with RVV 1.0 spec Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/11009 From jnimeh at openjdk.org Tue Nov 8 04:07:32 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 8 Nov 2022 04:07:32 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <_YnjdPpV7hzrWhrcM1Y_LG9IM26Nz-LHJWPr5UXPfB4=.03fe38dd-3b72-42ad-9a2b-42fd80fc136a@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Microbenchmark results (Note: ChaCha20-Poly1305 numbers do not include the pending Poly1305 intrinsics to be delivered in #10582) x86_64 Processor: 4x Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz Java only (-XX:-UseChaCha20Intrinsics) -------------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 772956.829 ? 4434.965 ops/s ChaCha20.decrypt 1024 thrpt 40 230478.075 ? 660.617 ops/s ChaCha20.decrypt 4096 thrpt 40 61504.367 ? 187.485 ops/s ChaCha20.decrypt 16384 thrpt 40 15671.893 ? 59.860 ops/s ChaCha20.encrypt 256 thrpt 40 793708.698 ? 3587.562 ops/s ChaCha20.encrypt 1024 thrpt 40 232413.842 ? 808.766 ops/s ChaCha20.encrypt 4096 thrpt 40 61586.483 ? 94.821 ops/s ChaCha20.encrypt 16384 thrpt 40 15749.637 ? 34.497 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 219991.514 ? 2117.364 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 101672.568 ? 1921.214 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 32582.073 ? 946.061 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 8485.793 ? 26.348 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 291605.327 ? 2893.898 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 121034.948 ? 2545.312 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 32657.343 ? 114.322 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 8527.834 ? 33.711 ops/s Intrinsics enabled (-XX:UseAVX=1) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1293211.662 ? 9833.892 ops/s ChaCha20.decrypt 1024 thrpt 40 450135.559 ? 1614.303 ops/s ChaCha20.decrypt 4096 thrpt 40 123675.797 ? 576.160 ops/s ChaCha20.decrypt 16384 thrpt 40 31707.566 ? 93.988 ops/s ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 250683.639 ? 3990.340 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 131000.144 ? 2895.410 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 45215.542 ? 1368.148 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 11879.307 ? 55.006 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 355255.774 ? 5397.267 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 156057.380 ? 4294.091 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 47016.845 ? 1618.779 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 12113.919 ? 45.792 ops/s Intrinsics enabled (-XX:UseAVX=2) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1824729.604 ? 12130.198 ops/s ChaCha20.decrypt 1024 thrpt 40 746024.477 ? 3921.472 ops/s ChaCha20.decrypt 4096 thrpt 40 219662.823 ? 2128.901 ops/s ChaCha20.decrypt 16384 thrpt 40 57198.868 ? 221.973 ops/s ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 260529.149 ? 4298.662 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 144967.984 ? 4558.697 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 50047.575 ? 171.204 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 13976.999 ? 72.299 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 378971.408 ? 9324.721 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 179361.248 ? 7968.109 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 55727.145 ? 2860.765 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 14205.830 ? 59.411 ops/s Intrinsics enabled (-XX:UseAVX=3) --------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1182958.956 ? 7782.532 ops/s ChaCha20.decrypt 1024 thrpt 40 1003530.400 ? 10315.996 ops/s ChaCha20.decrypt 4096 thrpt 40 339428.341 ? 2376.804 ops/s ChaCha20.decrypt 16384 thrpt 40 92903.498 ? 1112.425 ops/s ChaCha20.encrypt 256 thrpt 40 1266584.736 ? 5101.597 ops/s ChaCha20.encrypt 1024 thrpt 40 1059717.173 ? 9435.649 ops/s ChaCha20.encrypt 4096 thrpt 40 350520.581 ? 2787.593 ops/s ChaCha20.encrypt 16384 thrpt 40 95181.548 ? 1638.579 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 200722.479 ? 2045.896 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 124660.386 ? 3869.517 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 44059.327 ? 143.765 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 12412.936 ? 54.845 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 274528.005 ? 2945.416 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 145146.188 ? 857.254 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 47045.637 ? 128.049 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 12643.929 ? 55.748 ops/s aarch64 Processor: 2 x CPU implementer : 0x41, architecture: 8, variant : 0x3, part : 0xd0c, revision : 1 Java only (-XX:-UseChaCha20Intrinsics) -------------------------------------- Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1301037.920 ? 1734.836 ops/s ChaCha20.decrypt 1024 thrpt 40 387115.013 ? 1122.264 ops/s ChaCha20.decrypt 4096 thrpt 40 102591.108 ? 229.456 ops/s ChaCha20.decrypt 16384 thrpt 40 25878.583 ? 89.351 ops/s ChaCha20.encrypt 256 thrpt 40 1332737.880 ? 2478.508 ops/s ChaCha20.encrypt 1024 thrpt 40 390288.663 ? 2361.851 ops/s ChaCha20.encrypt 4096 thrpt 40 101882.728 ? 744.907 ops/s ChaCha20.encrypt 16384 thrpt 40 26001.888 ? 71.907 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 351189.393 ? 2209.148 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 142960.999 ? 361.619 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 42437.822 ? 85.557 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 11173.152 ? 24.969 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 444870.664 ? 12571.799 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 158481.143 ? 2149.208 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 43610.721 ? 282.795 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 11150.783 ? 27.911 ops/s Intrinsics enabled ------------------ Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.decrypt 256 thrpt 40 1907215.648 ? 3163.767 ops/s ChaCha20.decrypt 1024 thrpt 40 631804.007 ? 736.430 ops/s ChaCha20.decrypt 4096 thrpt 40 172280.991 ? 362.190 ops/s ChaCha20.decrypt 16384 thrpt 40 44150.254 ? 98.927 ops/s ChaCha20.encrypt 256 thrpt 40 1990050.859 ? 6380.625 ops/s ChaCha20.encrypt 1024 thrpt 40 636574.405 ? 3332.471 ops/s ChaCha20.encrypt 4096 thrpt 40 173258.615 ? 327.199 ops/s ChaCha20.encrypt 16384 thrpt 40 44191.925 ? 72.996 ops/s ChaCha20Poly1305.decrypt 256 thrpt 40 360555.774 ? 1988.467 ops/s ChaCha20Poly1305.decrypt 1024 thrpt 40 162093.489 ? 413.684 ops/s ChaCha20Poly1305.decrypt 4096 thrpt 40 50799.888 ? 110.955 ops/s ChaCha20Poly1305.decrypt 16384 thrpt 40 13560.165 ? 32.208 ops/s ChaCha20Poly1305.encrypt 256 thrpt 40 458079.724 ? 13746.235 ops/s ChaCha20Poly1305.encrypt 1024 thrpt 40 188228.966 ? 3498.480 ops/s ChaCha20Poly1305.encrypt 4096 thrpt 40 52665.733 ? 151.740 ops/s ChaCha20Poly1305.encrypt 16384 thrpt 40 13606.192 ? 52.134 ops/s ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Tue Nov 8 04:11:55 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 8 Nov 2022 04:11:55 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v2] In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <w91UdYDSds4Xa8dFfMwQaMMrjzVpjwS1ZcI8AbHM1JU=.66b990ff-c509-4d22-b426-9be51e1a3606@github.com> > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Jamil Nimeh has updated the pull request incrementally with six additional commits since the last revision: - Change intrinsic helper method name conform to convention - consolidate chacha macroAssembler routines into chacha stubGenerator file - More indentation fixes on aarch64 - rename chapoly->chacha for macro file - rename chacha macro file to be consistent with x86_64 naming - Fix indentation issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7702/files - new: https://git.openjdk.org/jdk/pull/7702/files/c79abe34..53b432e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=00-01 Stats: 826 lines in 9 files changed: 212 ins; 254 del; 360 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Tue Nov 8 04:13:30 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 8 Nov 2022 04:13:30 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v2] In-Reply-To: <ZUxveAxaS5YNw-Dt6LtOKkPe9Sya2c59PaBObFKKTIw=.f66f0a2c-c2c3-4382-96b6-cf1f47172349@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <ZUxveAxaS5YNw-Dt6LtOKkPe9Sya2c59PaBObFKKTIw=.f66f0a2c-c2c3-4382-96b6-cf1f47172349@github.com> Message-ID: <CUgMRm3tLXJQcxsWn1cwpLVBCKsz_Ad3Qy5xauo0naI=.3715cb24-23dd-47d8-ad89-451cacff6299@github.com> On Mon, 7 Nov 2022 18:02:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request incrementally with six additional commits since the last revision: >> >> - Change intrinsic helper method name conform to convention >> - consolidate chacha macroAssembler routines into chacha stubGenerator file >> - More indentation fixes on aarch64 >> - rename chapoly->chacha for macro file >> - rename chacha macro file to be consistent with x86_64 naming >> - Fix indentation issues > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 989: > >> 987: bool multi_block); >> 988: >> 989: // ChaCha20-Poly1305 macroAssembler defs > > These methods can also be moved to `stubGenerator_x86_64.hpp`/`stubGenerator_x86_64_chacha.cpp`. There are no other usages besides x86-64-specific CC20 stub. Done, and removed `macroAssembler_x86_chacha.cpp` since it is no longer needed. > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 870: > >> 868: */ >> 869: @IntrinsicCandidate >> 870: private static int _chaCha20Block(int[] initState, byte[] result) { > > Seems like there are 2 major naming conventions for intrinsic helper methods: prepend "impl" (e.g, `CounterMode.implCrypt`) or append "0" (`GaloisCounterMode.implGCMCrypt0`). I'd prefer to see either one used here. Done. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From dholmes at openjdk.org Tue Nov 8 04:25:27 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Nov 2022 04:25:27 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <VaO06Ysa0S_mTihCqs4TmZRngVEeuWj_HysYBV61-Y8=.305b073e-965f-4682-94b4-8af36a7c509e@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Note the loader involved need not be our own default platform loader and we don't know how a custom system loader might operate. The specification for this says nothing about thread-safety, nor that the VM will do any locking, so I think it is okay to remove it - but it is a change in behaviour that should be documented by a CSR request. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From dholmes at openjdk.org Tue Nov 8 04:49:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Nov 2022 04:49:24 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <pyVBMbAXWvOcHcrjJhE5TXK4uXdYd4rMXo-jKtqjNZI=.96e20218-354a-4911-b3f3-e0a70aaed854@github.com> On Tue, 8 Nov 2022 00:58:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. This is a nice simplification of the VM side of the code! I do have to wonder why this was implemented the way it was rather than doing the simple upcall as you now do, but I suspect it was just performance, which should not be an issue today. One query/concern about exception handling. Thanks. src/hotspot/share/prims/jvmtiEnvBase.cpp line 810: > 808: if (HAS_PENDING_EXCEPTION) { > 809: CLEAR_PENDING_EXCEPTION; > 810: return JVMTI_ERROR_OUT_OF_MEMORY; Do we need to handle unexpected exceptions better, rather than just claiming they are OOME? ------------- PR: https://git.openjdk.org/jdk/pull/11033 From fyang at openjdk.org Tue Nov 8 05:47:33 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Nov 2022 05:47:33 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null In-Reply-To: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> Message-ID: <mRULpVQzEAItTWnntRexFCuPp7yd_CWibE2YZUP7WZs=.f7998207-f0d1-4773-affb-984bb5ae2016@github.com> On Mon, 7 Nov 2022 04:03:57 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > Please see the JBS issue for more crash details. > > To reproduce using a cross-compiled build: > > # dump one cds-nocoops.jsa > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version > > # reproduce > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ > -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens > > > `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. > > To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. > > This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. > > Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. > > Thanks, > Xiaolin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 Thanks for fixing this. Several minor nits. src/hotspot/cpu/riscv/methodHandles_riscv.cpp line 81: > 79: __ beqz(obj, L_bad); > 80: __ push_reg(RegSet::of(temp, temp2), sp); > 81: __ load_klass(temp, obj, temp2); Could you please help rename 'temp' to 'temp1' at the same time? src/hotspot/cpu/riscv/riscv.ad line 1741: > 1739: > 1740: Label skip; > 1741: __ cmp_klass(j_rarg0, t1, t0, t2 /* as a tmp */, skip); You might also want to add comment for 't0' here: /* as a temp */ src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 1409: > 1407: const Register ic_reg = t1; > 1408: const Register receiver = j_rarg0; > 1409: const Register tmp_reg = t2; No need to introduce 'tmp_reg' here. I think you can use 't2' directly like the change you made in file: src/hotspot/cpu/riscv/riscv.ad ------------- PR: https://git.openjdk.org/jdk/pull/11010 From stuefe at openjdk.org Tue Nov 8 07:19:26 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Nov 2022 07:19:26 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <XOlvQrrM4SgopFY-unuI164XZH9htKVXadO2LKv1QUE=.f0f49ee6-17ef-499c-8b99-16f8d02a338a@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Hi Axel, I am not sure this is a good idea tbh, for two reasons: Each time we crash out in VMError, we build up the stack. We never ever unwind that stack, eg. via longjmp, since that would introduce other errors (e.g. abandoned locks). There is a natural limit to how many recursive crashes we can handle since the stack is not endless. Each secondary crash increases the risk of running into guard pages and spoiling the game for follow-up STEPs. Therefore we limit the number of allowed recursive crashes. This limit also serves a second purpose: if we crash that often, maybe we should just stop already and let the process die. That brings me to the second problem, which is time. When we crash, we want to go down as fast as possible, e.g. allow a server to restart the node. OTOH we want a nice hs-err file. Therefore the time error handling is allowed to take is carefully limited. See `ErrorLogTimeout`: by default 2 Minutes, though our customers usually lower this to 30 seconds or even lower. Each STEP has a timeout, set to a fraction of that total limit (A quarter). A quarter gives us room for 2-3 hanging STEPS and still leaves enough breathing room for the remainder of the STEPS. If you now increase the number of STEPS, all these calculations are off. We may hit the recursive error limit much sooner, since every individual register printout may crash. And if they hang, they may eat up the ErrorLogTimeout much sooner. So we will get more torn hs-err files with "recursive limit reached, giving up" or "timeout reached, giving up". Note that one particularly fragile information is the printing of debug info, e.g. function name, etc. Since that relies on some parsing of debugging information. In our experience that can crash out or hang often, especially if the debug info has to be read from file or network. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11017 From stuefe at openjdk.org Tue Nov 8 07:25:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Nov 2022 07:25:28 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <XwV6rI2R4xqqtqX2hCql_RjuusRwVL6SsF-iH9JhKeQ=.bc57c2bf-00e4-4c0c-af4d-fa3cfe0c055b@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <bEqxnLEmzJXZ1KNSpMSSoPlNA2wIIzeS285A2RNUfQE=.eacdea94-afc1-460f-bb44-bfedfc96a39b@github.com> <XwV6rI2R4xqqtqX2hCql_RjuusRwVL6SsF-iH9JhKeQ=.bc57c2bf-00e4-4c0c-af4d-fa3cfe0c055b@github.com> Message-ID: <Uu0KX_34xIFmweIm5ullJMrwsLmQb6YgktGipym0Kyw=.2cddcad5-b6ea-43be-8cbd-8d727c7e58d5@github.com> On Mon, 7 Nov 2022 16:28:34 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> src/hotspot/share/utilities/vmError.cpp line 593: >> >>> 591: controlled_crash(TestCrashInErrorHandler); >>> 592: return true; >>> 593: }()) >> >> I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. > >> I don't think it adds much readability, but it is a wide-spread patch that will add unnecessary noise for us backporters. Just my opinion, lets see what others think. > > I was worried about that too. This patch initially came about when I was writing #11017 which required rewriting the macro logic and add extra nested scope. The fact that the common case for a step was checking the verbose flag made me refactor out just the verbose case. And then I just refactored out the rest the few extra conditions as well. > > But I am pretty ambivalent about this change. #11017 contains the newline changes as well. So if there are more opinions that this makes back porting work harder it can be dropped. > > >> I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. > > I added it because in my initial change (9449501f063e7662a530adf921ae53dadd0312a0), a theoretical crash in a condition would recover but not proceed to the next step. The test change is to avoid such a regression if someone thinks of doing the same simplification of the macro condition logic. (It would crash before `_current_step` was updated.) Ah, I understand. But you changed this to two nested ifs. So this is no problem anymore. I still don't think it is needed with your new version. IIUC it prevents us from the problem you describe if someone changes the STEP macros in the future *and* has a very complex if statement that would crash. ------------- PR: https://git.openjdk.org/jdk/pull/11018 From luhenry at openjdk.org Tue Nov 8 07:34:35 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Nov 2022 07:34:35 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v7] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <USqZ-hj48jmO86DWscqbw3IxSfhfVKO1V_TBJ7SJVqc=.d42b651d-9191-4134-a60f-01f7f3093c57@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: support large disp values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/51725fd4..42a61aa8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=05-06 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Tue Nov 8 07:34:35 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Nov 2022 07:34:35 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v6] In-Reply-To: <GLEozuL5Ipj_a0Zoyuj6cj0lGi1nyZwStWRPSQVPu3o=.4afc8b31-bfbf-4fc0-9add-09ef7e844955@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <hVVzJNW5i13eZ3g_aRqJJjuhs98ZOwJdAl4M2gvFNcU=.858ce35f-5e93-4dba-8ef5-1ce307ce3922@github.com> <YOhpqH4hXpsngfmHMhoe6kPPssW1Zz8AxhcemX9AOTA=.41aca39f-ec96-459c-9917-2f32106ade70@github.com> <qrGTLK-U1f0Dgdgq8dLofSoVWW00UBYAuVz8m2hOjPQ=.e2e18141-41ea-43ea-a215-ba2e6f832d80@github.com> <oY2bAmngcp-GLIZ5lNvhPD6s-itQDnbP-kWtvUfCePo=.c0de9fe6-a062-4ff2-a158-de96a1e8df4c@github.com> <TFF7iTSJjPdkuF35PXZ9o3ZZ6UHxEL-FzTsruwAfrRA=.7e3cdeee-9e02-4695-9759-f0e31b5c0b14@github.com> <GLEozuL5Ipj_a0Zoyuj6cj0lGi1nyZwStWRPSQVPu3o=.4afc8b31-bfbf-4fc0-9add-09ef7e844955@github.com> Message-ID: <xVZtMCYMnGBjp4cIZ25LesLLfuOPAFmU8LghSRyVIlg=.3dcaf5e3-5fbc-4557-ab38-6ea121acc52b@github.com> On Tue, 8 Nov 2022 02:46:56 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> How about we add an assert for now? That `disp` value is driven by `AllocatePrefetchDistance`, `AllocatePrefetchStepSize`, and `AllocatePrefetchLines`/`AllocateInstancePrefetchLines`. Given the default value of these config variables, the maximum `disp` would be `3*CacheLineSize + max(3,1) * CacheLineSize` which is (on most platforms) 384 bytes. If you max-out the number of lines to prefetch, you'd get `3*CacheLineSize + 64 * CacheLineSize` or 4288 bytes which indeed requires more than 12 bit of storage. >> >> Either we can safeguard against that in `vm_version_riscv.cpp` (given these values are quite nonsensical in themselves, given the cost/benefit of doing so many prefetches or so far). Either we can complexify the code generation here in that case. > > I'm inclined to support the second option. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From stuefe at openjdk.org Tue Nov 8 07:53:13 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Nov 2022 07:53:13 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <k3HSBi7FcvuLz-tNwnYEefMLlTMZ6xdBtivzmOTAWwI=.e005234b-f7d6-4e31-bb0d-eef87d82e473@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) P.S. I realize all this should be put down as a comment in the code, to save other developers wasting time. Since it is really not obvious at all. I'll put that on my list. https://bugs.openjdk.org/browse/JDK-8296513 ------------- PR: https://git.openjdk.org/jdk/pull/11017 From stuefe at openjdk.org Tue Nov 8 07:56:10 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Nov 2022 07:56:10 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <Uu0KX_34xIFmweIm5ullJMrwsLmQb6YgktGipym0Kyw=.2cddcad5-b6ea-43be-8cbd-8d727c7e58d5@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <bEqxnLEmzJXZ1KNSpMSSoPlNA2wIIzeS285A2RNUfQE=.eacdea94-afc1-460f-bb44-bfedfc96a39b@github.com> <XwV6rI2R4xqqtqX2hCql_RjuusRwVL6SsF-iH9JhKeQ=.bc57c2bf-00e4-4c0c-af4d-fa3cfe0c055b@github.com> <Uu0KX_34xIFmweIm5ullJMrwsLmQb6YgktGipym0Kyw=.2cddcad5-b6ea-43be-8cbd-8d727c7e58d5@github.com> Message-ID: <TOl2JLGU-8zkY8LY0G9FxXdFJB3CLxHwOIOXLA2p-LI=.31763b67-428c-4919-8907-ab0cfbfdb6b6@github.com> On Tue, 8 Nov 2022 07:23:20 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >>> I don't think it adds much readability, but it is a wide-spread patch that will add unnecessary noise for us backporters. Just my opinion, lets see what others think. >> >> I was worried about that too. This patch initially came about when I was writing #11017 which required rewriting the macro logic and add extra nested scope. The fact that the common case for a step was checking the verbose flag made me refactor out just the verbose case. And then I just refactored out the rest the few extra conditions as well. >> >> But I am pretty ambivalent about this change. #11017 contains the newline changes as well. So if there are more opinions that this makes back porting work harder it can be dropped. >> >> >>> I don't think this is necessary. The tests for secondary crash handling are exhaustive, there is no reason why it should differ from crash condition handling outside of a crash condition. If the signal handler is correctly set up, it should work. If it is not, the prior step is sufficient. >> >> I added it because in my initial change (9449501f063e7662a530adf921ae53dadd0312a0), a theoretical crash in a condition would recover but not proceed to the next step. The test change is to avoid such a regression if someone thinks of doing the same simplification of the macro condition logic. (It would crash before `_current_step` was updated.) > > Ah, I understand. But you changed this to two nested ifs. So this is no problem anymore. > > I still don't think it is needed with your new version. IIUC it prevents us from the problem you describe if someone changes the STEP macros in the future *and* has a very complex if statement that would crash. About the STEP_IF, the more I look at it, the more I like it. So I am ambivalent. Let's hear what others say (@coleenp?). ------------- PR: https://git.openjdk.org/jdk/pull/11018 From alanb at openjdk.org Tue Nov 8 08:01:28 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 08:01:28 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <VaO06Ysa0S_mTihCqs4TmZRngVEeuWj_HysYBV61-Y8=.305b073e-965f-4682-94b4-8af36a7c509e@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> <VaO06Ysa0S_mTihCqs4TmZRngVEeuWj_HysYBV61-Y8=.305b073e-965f-4682-94b4-8af36a7c509e@github.com> Message-ID: <KGPJYXsstshKbmm7zbDl5Pia5tC-gOGzdNiptm2pAUc=.6fd04da6-359e-4d1f-a6f5-37c3a86bc9df@github.com> On Tue, 8 Nov 2022 04:23:13 GMT, David Holmes <dholmes at openjdk.org> wrote: > Note the loader involved need not be our own default platform loader and we don't know how a custom system loader might operate. The specification for this says nothing about thread-safety, nor that the VM will do any locking, so I think it is okay to remove it - but it is a change in behaviour that should be documented by a CSR request. In that configuration, the custom system class loader will be created with the app class loader as its parent. It's still the app class loader that loads from the class path and it will be the app class loader's appendToClassPathForInstrumentation method that is called (there's no equivalent for custom system class loaders). I would hope there are tests for this but the discussion makes me wonder what SystemDictionary::java_system_loader() returns, esp. given a custom class loader won't be created until initPhase3. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From xlinzheng at openjdk.org Tue Nov 8 09:00:38 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 8 Nov 2022 09:00:38 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> Message-ID: <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> > Please see the JBS issue for more crash details. > > To reproduce using a cross-compiled build: > > # dump one cds-nocoops.jsa > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version > > # reproduce > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ > -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens > > > `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. > > To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. > > This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. > > Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. > > Thanks, > Xiaolin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Fix as to comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11010/files - new: https://git.openjdk.org/jdk/pull/11010/files/a1ede724..0f7ce34c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11010&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11010&range=00-01 Stats: 13 lines in 4 files changed: 0 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/11010.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11010/head:pull/11010 PR: https://git.openjdk.org/jdk/pull/11010 From xlinzheng at openjdk.org Tue Nov 8 09:00:39 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 8 Nov 2022 09:00:39 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <mRULpVQzEAItTWnntRexFCuPp7yd_CWibE2YZUP7WZs=.f7998207-f0d1-4773-affb-984bb5ae2016@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <mRULpVQzEAItTWnntRexFCuPp7yd_CWibE2YZUP7WZs=.f7998207-f0d1-4773-affb-984bb5ae2016@github.com> Message-ID: <zbk5d85_v6UIhrd4vyA9DOLOBrqr96Xizbu_FeXUt78=.69575adc-f4c2-46c5-9f87-3289562d597e@github.com> On Tue, 8 Nov 2022 05:41:21 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix as to comments > > src/hotspot/cpu/riscv/riscv.ad line 1741: > >> 1739: >> 1740: Label skip; >> 1741: __ cmp_klass(j_rarg0, t1, t0, t2 /* as a tmp */, skip); > > You might also want to add comment for 't0' here: /* as a temp */ The other two are done, and the comments like this are renamed to a friendlier `/* call-clobbered t2 as a tmp */`. t0 is always used freely as a tmp register, I guess it is not necessary to add a comment for t0? It seems only the newly-added t2 here matters. ------------- PR: https://git.openjdk.org/jdk/pull/11010 From sspitsyn at openjdk.org Tue Nov 8 09:02:40 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 8 Nov 2022 09:02:40 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <SEAoxhU6nKvAqgIa0vm7pDa8YGWyJNK436icRmZM1j4=.f257809b-8793-46e8-91c0-b50b77eaa391@github.com> On Tue, 8 Nov 2022 00:58:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. This looks nice in general. But I will make another pass. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From gcao at openjdk.org Tue Nov 8 09:25:03 2022 From: gcao at openjdk.org (Gui Cao) Date: Tue, 8 Nov 2022 09:25:03 GMT Subject: RFR: 8296515: RISC-V: Optimized MaxReductionV/MinReductionV/AddReductionV node implementation Message-ID: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> HI, The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. Please take a look and have some reviews. Thanks a lot. ## Testing: - hotspot and jdk tier1 on unmatched board without new failures - test/jdk/jdk/incubator/vector/* with fastdebug on qemu ------------- Commit messages: - Optimized MaxReductionV/MinReductionV/AddReductionV node implementation Changes: https://git.openjdk.org/jdk/pull/11036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296515 Stats: 173 lines in 4 files changed: 14 ins; 114 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/11036.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11036/head:pull/11036 PR: https://git.openjdk.org/jdk/pull/11036 From aboldtch at openjdk.org Tue Nov 8 09:30:28 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 8 Nov 2022 09:30:28 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <XOlvQrrM4SgopFY-unuI164XZH9htKVXadO2LKv1QUE=.f0f49ee6-17ef-499c-8b99-16f8d02a338a@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <XOlvQrrM4SgopFY-unuI164XZH9htKVXadO2LKv1QUE=.f0f49ee6-17ef-499c-8b99-16f8d02a338a@github.com> Message-ID: <Mjp6ZW_cmhFj3ZXVyoMfTSa6XIkm9CAW4L28apiHAOM=.1bd16f14-2413-4ed9-858e-1a1c924bc283@github.com> On Tue, 8 Nov 2022 07:17:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Each time we crash out in VMError, we build up the stack. We never ever unwind that stack, eg. via longjmp, since that would introduce other errors (e.g. abandoned locks). There is a natural limit to how many recursive crashes we can handle since the stack is not endless. Each secondary crash increases the risk of running into guard pages and spoiling the game for follow-up STEPs. I have no experience with stack depth being the problem in crashes (with the caveat that I have only run with this patch for a few weeks), but have experienced cases where only having a hs_err file available and having the register print_location bailing out early, where missing printing the rest has been unfavourable. > Therefore we limit the number of allowed recursive crashes. This limit also serves a second purpose: if we crash that often, maybe we should just stop already and let the process die. Fair, tough I am curious how we want to decide this limit, and why ~60 is fine but ~90 would be too much (I am guessing that most steps have no, or a very small possibility of crashing). Maybe this should instead be solved with a general solution which stops the reporting if some retry limit is reached. Also the common case is that it does not crash repeatedly, and if it does, that is the scenario where I personally really would want the information, because something is seriously wrong. But maybe not at the cost of stack overflows, if it is a problem maybe some stack address limit can used to disable reentry in reentrant steps. > That brings me to the second problem, which is time. When we crash, we want to go down as fast as possible, e.g. allow a server to restart the node. OTOH we want a nice hs-err file. Therefore the time error handling is allowed to take is carefully limited. See `ErrorLogTimeout`: by default 2 Minutes, though our customers usually lower this to 30 seconds or even lower. > > Each STEP has a timeout, set to a fraction of that total limit (A quarter). A quarter gives us room for 2-3 hanging STEPS and still leaves enough breathing room for the remainder of the STEPS. > > If you now increase the number of STEPS, all these calculations are off. We may hit the recursive error limit much sooner, since every individual register printout may crash. And if they hang, they may eat up the ErrorLogTimeout much sooner. So we will get more torn hs-err files with "recursive limit reached, giving up" or "timeout reached, giving up". The timeout problem was something I thought about as well, and I think you are correct, and that we should treat the whole reentrant step as one timeout. (Same behaviour as before). > Note that one particularly fragile information is the printing of debug info, e.g. function name, etc. Since that relies on some parsing of debugging information. In our experience that can crash out or hang often, especially if the debug info has to be read from file or network. > Alright, I see this as an argument for reentrant steps with one timeout for all iterations of the inner loop combined. I've heard opinions of something similar to reentrant steps in other parts of the hs_err printing. Like stack frame printing, where you can have iterative stages where each stage builds up more detailed information, until it crashes. And then prints what information it got so far. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From luhenry at openjdk.org Tue Nov 8 09:51:38 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Nov 2022 09:51:38 GMT Subject: RFR: 8296515: RISC-V: Optimized MaxReductionV/MinReductionV/AddReductionV node implementation In-Reply-To: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> Message-ID: <_TAOYTzfq92gnfJl4jbWo5Z7GusV-Rf2HY728GjZZR8=.16d0a295-780e-4393-9a03-5a10eb8b0a0e@github.com> On Tue, 8 Nov 2022 09:16:22 GMT, Gui Cao <gcao at openjdk.org> wrote: > HI, > > The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/* with fastdebug on qemu src/hotspot/cpu/riscv/riscv_v.ad line 821: > 819: ins_encode %{ > 820: BasicType bt = Matcher::vector_element_basic_type(this, $src2); > 821: __ reduce_operation($dst$$Register, as_VectorRegister($tmp$$reg), Given the predicate `Matcher::vector_element_basic_type(n->in(2)) != T_LONG`, is there a risk it's going to match for types that don't fit the assert at https://github.com/openjdk/jdk/pull/11036/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R1696 ? It seems the same approach as `reduce_addI` should be taken with the following predicate: predicate(Matcher::vector_element_basic_type(n->in(2)) == T_BYTE || Matcher::vector_element_basic_type(n->in(2)) == T_SHORT || Matcher::vector_element_basic_type(n->in(2)) == T_INT); src/hotspot/cpu/riscv/riscv_v.ad line 855: > 853: ins_encode %{ > 854: BasicType bt = Matcher::vector_element_basic_type(this, $src2); > 855: __ rvv_reduce_integral($dst$$Register, as_VectorRegister($tmp$$reg), Same as https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443 src/hotspot/cpu/riscv/riscv_v.ad line 889: > 887: ins_encode %{ > 888: BasicType bt = Matcher::vector_element_basic_type(this, $src2); > 889: __ rvv_reduce_integral($dst$$Register, as_VectorRegister($tmp$$reg), Same as https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443 ------------- PR: https://git.openjdk.org/jdk/pull/11036 From alanb at openjdk.org Tue Nov 8 10:05:18 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 10:05:18 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <pyVBMbAXWvOcHcrjJhE5TXK4uXdYd4rMXo-jKtqjNZI=.96e20218-354a-4911-b3f3-e0a70aaed854@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <pyVBMbAXWvOcHcrjJhE5TXK4uXdYd4rMXo-jKtqjNZI=.96e20218-354a-4911-b3f3-e0a70aaed854@github.com> Message-ID: <x9nTEgrP1wqihO1r0k4QLo_HcA1M3z3awFub5AVzuTk=.5e80aedc-3bf5-44a8-a5a1-5c76586e8662@github.com> On Tue, 8 Nov 2022 04:45:17 GMT, David Holmes <dholmes at openjdk.org> wrote: > This is a nice simplification of the VM side of the code! I do have to wonder why this was implemented the way it was rather than doing the simple upcall as you now do, but I suspect it was just performance, I don't think JVMTI GetThreadGroupChildren was ever performance critical, it's just that historically JVMTI functions haven't done many upcalls because of the potential side effects of events generated when executing Java code. I think the horse has bolted on this already, e.g. the JVMTI functions added in Java 9 all require calling into Java code. So I think it is a good simplification but a reminder that the set of "Universal errors" in the JVMTI spec may not be sufficient to cover the possible exceptions. As you point out, the current patch maps all errors/exceptions to JVMTI_ERROR_OUT_OF_MEMORY. It's the most likely reason for it to fail but it might stack overflow or something else. I note that JvmtiEnv::AddToSystemClassLoaderSearch maps errors/exceptions to JVMTI_ERROR_INTERNAL. Maybe we need an issue in JBS to track examine this issue as it's not specific to GetThreadGroupChildren. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From alanb at openjdk.org Tue Nov 8 10:09:32 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 10:09:32 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <-IUgUziZB4xaQHdy5yU4HFWyH9CP9VzMkH6MS-QATBY=.ae540ff2-d629-4459-bc5a-bb0da8e7f784@github.com> On Tue, 8 Nov 2022 00:58:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. src/java.base/share/classes/java/lang/ThreadGroup.java line 802: > 800: int length = groups.size(); > 801: return groups.toArray(new ThreadGroup[length]); > 802: } Would you mind changing the comment to "Returns a snapshot of the subgroups as an array, used by JVMTI.", only to be consistent with the other methods that return subgroups as they all document it as a "snapshot". Also I think I would rename this to subgroupsAsArray as there isn't a "subgroups array". If you want, `return groups.toArray(new ThreadGroup[0]);` will do the sizing for you. Minor nit, you've additional blank lines after this method, I assume you only meant to add one. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From mcimadamore at openjdk.org Tue Nov 8 10:36:49 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 10:36:49 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v7] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <oDftcc-CmPklo43VUAFWL4Vu5xc4tmlu6CNTXfHyrHg=.ef6d5a1b-4baf-4608-9b83-1f08aef9561f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with seven additional commits since the last revision: - Update src/java.base/share/classes/java/lang/foreign/MemorySession.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/MemorySegment.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/MemoryLayout.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/Arena.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/Arena.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Bring windows CallArranger in sync with panama repo (again) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/f04be0da..cc4ff582 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=05-06 Stats: 31 lines in 6 files changed: 0 ins; 1 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From alanb at openjdk.org Tue Nov 8 10:36:49 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 10:36:49 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <XldjMSAe4YRggxFjq5SJYgm82U6cVC9wI_7XKcFK3YQ=.5caadfe1-250e-4c31-9da9-9c6243574177@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/java/lang/ModuleLayer.java line 331: > 329: "enableNativeAccess"); > 330: target.implAddEnableNativeAccess(); > 331: return this; ModuelLayer.enableNativeAccess looks fine, we iterated on that in panama-foreign/pull/729. I assume you'll add @since 20. Also you might want to check the alignment, it looks like the method is indented by 5 instead of the usual 4 spaces. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From shade at openjdk.org Tue Nov 8 10:37:06 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Nov 2022 10:37:06 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v5] In-Reply-To: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> Message-ID: <KbvGzI0aAbjwHV9ps_M_Wvkmcf5bgIF0etAUFYXkNTw=.0c55b7ce-b985-4abf-b4de-4d35381a9bfd@github.com> > After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Fix build failures - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Also disable warnings in gtests - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10493&range=04 Stats: 46 lines in 9 files changed: 6 ins; 1 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/10493.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10493/head:pull/10493 PR: https://git.openjdk.org/jdk/pull/10493 From mcimadamore at openjdk.org Tue Nov 8 10:42:13 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 10:42:13 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v8] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <1i9WM9QJ9WZsXv_3ZCsHRBDpG65nehSUjtww0HHTYcw=.c53b9cc2-45c9-49d4-bedf-1de71bf86f99@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: - Fix bad indent on ModuleLayer.Controller - Update src/java.base/share/classes/java/lang/foreign/package-info.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/java.base/share/classes/java/lang/foreign/ValueLayout.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/cc4ff582..afb36a95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=06-07 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From fyang at openjdk.org Tue Nov 8 11:14:16 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Nov 2022 11:14:16 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> Message-ID: <-7G4VO7Q0fJ9mdlooNBFJvi72wmF9IrNp4t3ePTONPw=.941a55ad-d240-4723-988d-ed3f91c6d15a@github.com> On Tue, 8 Nov 2022 09:00:38 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> Please see the JBS issue for more crash details. >> >> To reproduce using a cross-compiled build: >> >> # dump one cds-nocoops.jsa >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version >> >> # reproduce >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ >> -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens >> >> >> `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. >> >> To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. >> >> This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. >> >> Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix as to comments Updated change looks good. Thanks for finding and fixing this. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11010 From alanb at openjdk.org Tue Nov 8 11:27:03 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 11:27:03 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v8] In-Reply-To: <1i9WM9QJ9WZsXv_3ZCsHRBDpG65nehSUjtww0HHTYcw=.c53b9cc2-45c9-49d4-bedf-1de71bf86f99@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <1i9WM9QJ9WZsXv_3ZCsHRBDpG65nehSUjtww0HHTYcw=.c53b9cc2-45c9-49d4-bedf-1de71bf86f99@github.com> Message-ID: <JECSFKnrC7PxS6M48mhn35efimvKQ_C_iuK8SvXaWUE=.bfc0324b-4cec-4ad7-8afb-9a3dc403a332@github.com> On Tue, 8 Nov 2022 10:42:13 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix bad indent on ModuleLayer.Controller > - Update src/java.base/share/classes/java/lang/foreign/package-info.java > > Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> > - Update src/java.base/share/classes/java/lang/foreign/ValueLayout.java > > Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> src/java.base/share/classes/java/lang/foreign/Arena.java line 34: > 32: * An arena allocates and manages the lifecycle of native segments. > 33: * <p> > 34: * An arena is a {@linkplain AutoCloseable closeable} segment allocator that has a {@link #session() memory session}. Should this is link MemorySession or linkplan memory session ? src/java.base/share/classes/java/lang/foreign/Arena.java line 98: > 96: * that memory session are also released. > 97: * @throws IllegalStateException if the session associated with this arena is not {@linkplain MemorySession#isAlive() alive}. > 98: * @throws WrongThreadException if this method is called from a thread other than the thread Should this be qualified to say that when the session is confined, and thread is called from a thread other than the owner? ------------- PR: https://git.openjdk.org/jdk/pull/10872 From gcao at openjdk.org Tue Nov 8 11:41:22 2022 From: gcao at openjdk.org (Gui Cao) Date: Tue, 8 Nov 2022 11:41:22 GMT Subject: RFR: 8296515: RISC-V: Optimized MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> Message-ID: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> > HI, > > The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/* with fastdebug on qemu Gui Cao has updated the pull request incrementally with two additional commits since the last revision: - Use the same predicate as reduce_addI - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11036/files - new: https://git.openjdk.org/jdk/pull/11036/files/be5cf97b..b295be7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11036&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11036&range=00-01 Stats: 34 lines in 4 files changed: 7 ins; 3 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/11036.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11036/head:pull/11036 PR: https://git.openjdk.org/jdk/pull/11036 From gcao at openjdk.org Tue Nov 8 11:47:19 2022 From: gcao at openjdk.org (Gui Cao) Date: Tue, 8 Nov 2022 11:47:19 GMT Subject: RFR: 8296515: RISC-V: Optimized MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <_TAOYTzfq92gnfJl4jbWo5Z7GusV-Rf2HY728GjZZR8=.16d0a295-780e-4393-9a03-5a10eb8b0a0e@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <_TAOYTzfq92gnfJl4jbWo5Z7GusV-Rf2HY728GjZZR8=.16d0a295-780e-4393-9a03-5a10eb8b0a0e@github.com> Message-ID: <vZ4WODWZdeQKYW7XfMsnQ_8R-WhAsXvv8YhXhXJ7-Jc=.d84dd324-1f6c-4b4d-bf3c-a90057ed50c5@github.com> On Tue, 8 Nov 2022 09:36:07 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > Given the predicate `Matcher::vector_element_basic_type(n->in(2)) != T_LONG`, is there a risk it's going to match for types that don't fit the assert at https://github.com/openjdk/jdk/pull/11036/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R1696 ? > > It seems the same approach as `reduce_addI` should be taken with the following predicate: > > ``` > predicate(Matcher::vector_element_basic_type(n->in(2)) == T_BYTE || > Matcher::vector_element_basic_type(n->in(2)) == T_SHORT || > Matcher::vector_element_basic_type(n->in(2)) == T_INT); > ``` Thanks. referring to aarch64 before, for unnecessary risks, now the same predicate is used as reduce_addI. > src/hotspot/cpu/riscv/riscv_v.ad line 855: > >> 853: ins_encode %{ >> 854: BasicType bt = Matcher::vector_element_basic_type(this, $src2); >> 855: __ rvv_reduce_integral($dst$$Register, as_VectorRegister($tmp$$reg), > > Same as https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443 > Same as [#11036 (comment)](https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443) Thanks, fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 889: > >> 887: ins_encode %{ >> 888: BasicType bt = Matcher::vector_element_basic_type(this, $src2); >> 889: __ rvv_reduce_integral($dst$$Register, as_VectorRegister($tmp$$reg), > > Same as https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443 > Same as [#11036 (comment)](https://github.com/openjdk/jdk/pull/11036#discussion_r1016358443) Thanks, fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From coleenp at openjdk.org Tue Nov 8 11:47:19 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 11:47:19 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <-IUgUziZB4xaQHdy5yU4HFWyH9CP9VzMkH6MS-QATBY=.ae540ff2-d629-4459-bc5a-bb0da8e7f784@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <-IUgUziZB4xaQHdy5yU4HFWyH9CP9VzMkH6MS-QATBY=.ae540ff2-d629-4459-bc5a-bb0da8e7f784@github.com> Message-ID: <vqyhBp--YAdSuJzmBTDOatoe2Moovw1evIuauCnkJCw=.d53b2fd2-d81e-4da1-9286-82eeb25b20b4@github.com> On Tue, 8 Nov 2022 10:05:34 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > src/java.base/share/classes/java/lang/ThreadGroup.java line 802: > >> 800: int length = groups.size(); >> 801: return groups.toArray(new ThreadGroup[length]); >> 802: } > > Would you mind changing the comment to "Returns a snapshot of the subgroups as an array, used by JVMTI.", only to be consistent with the other methods that return subgroups as they all document it as a "snapshot". Also I think I would rename this to subgroupsAsArray as there isn't a "subgroups array". > > If you want, `return groups.toArray(new ThreadGroup[0]);` will do the sizing for you. > > Minor nit, you've additional blank lines after this method, I assume you only meant to add one. Thank you for looking at this - that's a much better name and comment and thanks for the Java hints (I copied that from somewhere else). ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 11:56:23 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 11:56:23 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <cHN9fXVFwhTaAY0IFvLyi6br3APFGKkuSulLvE3GyWE=.4dc2578c-ef41-428b-b026-66c7b58dee63@github.com> On Tue, 8 Nov 2022 00:58:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Thanks Alan for your comment - yes, I think doing upcalls from JVMTI wasn't done before because of possible events, and it was the way things were done. I'm glad this is okay now. I'm going to file an issue for the error handling. From AddToSystemClassLoaderSearch - this is mapped to JVMTI_ERROR_INTERNAL when it's clearly (?) an OOM but there should be some sort of utility function. Maybe there is already? // need the path as java.lang.String Handle path = java_lang_String::create_from_platform_dependent_str(segment, THREAD); if (HAS_PENDING_EXCEPTION) { CLEAR_PENDING_EXCEPTION; return JVMTI_ERROR_INTERNAL; } ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 12:30:37 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 12:30:37 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v2] In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <IwsuhgkAcoCh4lrxNHrzRQUqC1-uwHX94txC8IzVpYA=.3cf4c50f-f259-41f0-b29b-2412a469d376@github.com> > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Handle non OOM exceptions and rename subgroupsAsArray. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11033/files - new: https://git.openjdk.org/jdk/pull/11033/files/a916f3fd..8615b179 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=00-01 Stats: 12 lines in 2 files changed: 5 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 12:30:38 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 12:30:38 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v2] In-Reply-To: <pyVBMbAXWvOcHcrjJhE5TXK4uXdYd4rMXo-jKtqjNZI=.96e20218-354a-4911-b3f3-e0a70aaed854@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <pyVBMbAXWvOcHcrjJhE5TXK4uXdYd4rMXo-jKtqjNZI=.96e20218-354a-4911-b3f3-e0a70aaed854@github.com> Message-ID: <EbjNSEDMPnvcxh0cS8HZPulLV9_tKQ68f9Z3mO8XwfQ=.d9a8b138-d242-4b6d-9258-ea4bdae86f32@github.com> On Tue, 8 Nov 2022 04:35:48 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle non OOM exceptions and rename subgroupsAsArray. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 810: > >> 808: if (HAS_PENDING_EXCEPTION) { >> 809: CLEAR_PENDING_EXCEPTION; >> 810: return JVMTI_ERROR_OUT_OF_MEMORY; > > Do we need to handle unexpected exceptions better, rather than just claiming they are OOME? I added a case for returning JVMTI_ERROR_INTERNAL also. In the spec https://docs.oracle.com/en/java/javase/11/docs/specs/jvmti.html#universal-error The other universal errors are inapplicable (and already checked in the case of JVMTI_ERROR_INVALID_THREAD_GROUP and the invalid environment one (by the jvmti code wrapper). ------------- PR: https://git.openjdk.org/jdk/pull/11033 From alanb at openjdk.org Tue Nov 8 12:50:52 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 12:50:52 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v2] In-Reply-To: <IwsuhgkAcoCh4lrxNHrzRQUqC1-uwHX94txC8IzVpYA=.3cf4c50f-f259-41f0-b29b-2412a469d376@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <IwsuhgkAcoCh4lrxNHrzRQUqC1-uwHX94txC8IzVpYA=.3cf4c50f-f259-41f0-b29b-2412a469d376@github.com> Message-ID: <KdrCQ8cwAapDVaO3-qFNazmJUCxsSSMOh6S4F4f17jo=.be631c28-01b0-4f50-83d4-325729d44042@github.com> On Tue, 8 Nov 2022 12:30:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Handle non OOM exceptions and rename subgroupsAsArray. I wonder if the intermediate resource array is needed now. With the change, subgroupsAsArray returns a Java array, JvmtiEnvBase::get_subgroups creates a resource array with a handle to each of the thread group oops, then JvmtiEnvBase::new_jthreadArray creates a new local ref for each group. src/java.base/share/classes/java/lang/ThreadGroup.java line 796: > 794: > 795: /** > 796: * Returns an snapshot of the subgroups as an array, used by JVMTI. The update looks good, just a typo here where it should be "a snapshot". ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 12:56:22 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 12:56:22 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <ZJRBAQREVp5EPW0aG1QT0BUA1nYAwsAMOQZBWSOj_hI=.cb00d69e-b8cf-4c0a-b3a5-33299287ee33@github.com> On Mon, 7 Nov 2022 13:25:53 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA Ok, I was afraid to look at this. I do like STEP_IF. Maybe the longer expressions could be functions ? ------------- PR: https://git.openjdk.org/jdk/pull/11018 From coleenp at openjdk.org Tue Nov 8 13:06:50 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 13:06:50 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v3] In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <XN6ZzLnQtSP1qDsw8xs9Ueb3bQH2Ymu-vCbtSs7YbBE=.d4779a8a-d63f-4732-b403-f83cac3d46af@github.com> > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11033/files - new: https://git.openjdk.org/jdk/pull/11033/files/8615b179..120bee6b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 13:06:50 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 13:06:50 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v2] In-Reply-To: <KdrCQ8cwAapDVaO3-qFNazmJUCxsSSMOh6S4F4f17jo=.be631c28-01b0-4f50-83d4-325729d44042@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <IwsuhgkAcoCh4lrxNHrzRQUqC1-uwHX94txC8IzVpYA=.3cf4c50f-f259-41f0-b29b-2412a469d376@github.com> <KdrCQ8cwAapDVaO3-qFNazmJUCxsSSMOh6S4F4f17jo=.be631c28-01b0-4f50-83d4-325729d44042@github.com> Message-ID: <ETOuz3d8Z4OO-xmvzdcEUDZsjUuWO1hLKYR93EYRSA8=.946c5472-cad1-44d9-8dc0-739218b2d00b@github.com> On Tue, 8 Nov 2022 12:48:01 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle non OOM exceptions and rename subgroupsAsArray. > > src/java.base/share/classes/java/lang/ThreadGroup.java line 796: > >> 794: >> 795: /** >> 796: * Returns an snapshot of the subgroups as an array, used by JVMTI. > > The update looks good, just a typo here where it should be "a snapshot". fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From jsjolen at openjdk.org Tue Nov 8 13:27:26 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 8 Nov 2022 13:27:26 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v4] In-Reply-To: <jyBu0bk0j4GViq064bUHR_IEZniVY04-oYV4fZ4QTj0=.8886438b-371a-4fac-8c31-7dc809aec42d@github.com> References: <lkbUp9kcZikb5T5OBU-ey8NbWeg15hpjP2KREyxZprA=.db7acfbd-3cff-4053-830b-74e3f594b5f8@github.com> <jyBu0bk0j4GViq064bUHR_IEZniVY04-oYV4fZ4QTj0=.8886438b-371a-4fac-8c31-7dc809aec42d@github.com> Message-ID: <8cjC6Gl2tXGHaNMS4fLnADwwpKk5ZfQ8-L2CUGEOI98=.27d0b48a-4929-4c7d-a4da-ca7239d2ece4@github.com> On Thu, 27 Oct 2022 10:32:38 GMT, Johan Sj?len <jsjolen at openjdk.org> wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove WizardMode && Verbose as per Coleen, add back WizardMode as > > per DHolmes Regarding: >UL has the precise description of locals missing, looks like this in PrintDeoptimizationDetails: This is not printed because that info comes from `javaVFrame::print`. `vframe::new_vframe` can return quite a few different `vframe` subclasses, and as such each of these needs to have their `print` be refactored into `print_on + print`. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From mcimadamore at openjdk.org Tue Nov 8 13:28:58 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 13:28:58 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v9] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <B1_b3U4qE-cnF0jtLHfpn2k5NWFj1jadMO6K3VTMJFk=.03e3366d-0920-4120-8878-92199c61950e@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Rework package-level javadoc for restricted methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/afb36a95..e2840232 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=07-08 Stats: 17 lines in 1 file changed: 9 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 8 13:44:25 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Nov 2022 13:44:25 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: <zfD8vrCtZwn0XEwFR_Yyyd1mgY_FsBk2_-vq3T58vTM=.9b0eda4f-1881-480a-9321-833491067b9b@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> <p9yT4iQ_9jf2UlD-ILu6l7_uD7YDFGlWoCp8uw9buHI=.4702a6ae-9a7e-4ca4-a23f-ca99ee3b01e2@github.com> <zfD8vrCtZwn0XEwFR_Yyyd1mgY_FsBk2_-vq3T58vTM=.9b0eda4f-1881-480a-9321-833491067b9b@github.com> Message-ID: <qYnLTGwuUrvhreT0IWsLu_1NHQEj7FhNjNuXXN9Fe54=.7caefc29-9036-4aa0-adf5-5c8a78484a39@github.com> On Fri, 21 Oct 2022 13:19:40 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Seems like the riscv port uses virtual destructors on classes that inherits from ResourceObj. This requires the delete operator to be defined. [C++ Standard](https://eel.is/c++draft/class.dtor#16) > > It occurs in the Assembler, MacroAssembler and InterpreterMacroAssembler which all have empty virtual destructors. And in SignatureHandlerGenerator which NULLs an internal field. Any of the RISCV porters that know why virtual destructors are used in this way, and if they are necessary. > > I can compile `make CONF=riscv hotspot` with the virtual destructors removed. @RealFYang could you take a look at the proposed patch to remove these destructors?: https://github.com/openjdk/jdk/commit/3ebe35bff744dbee6bcc509e2ce0a7aafaf0f1ea Currently, those destructors are blocking the progress of this patch. An alternative would be to reinstate `ResourceObj::operator delete` but I think we'd prefer not to do that. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From alanb at openjdk.org Tue Nov 8 13:47:15 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 13:47:15 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v3] In-Reply-To: <XN6ZzLnQtSP1qDsw8xs9Ueb3bQH2Ymu-vCbtSs7YbBE=.d4779a8a-d63f-4732-b403-f83cac3d46af@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <XN6ZzLnQtSP1qDsw8xs9Ueb3bQH2Ymu-vCbtSs7YbBE=.d4779a8a-d63f-4732-b403-f83cac3d46af@github.com> Message-ID: <yn-0fZ7wYyZN4VUqWUG0iQtxmo4sMqOX4kG_I8PdCS0=.1992967d-d5ce-4662-b451-e239c0ddf357@github.com> On Tue, 8 Nov 2022 13:06:50 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > fix typo Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11033 From jsjolen at openjdk.org Tue Nov 8 13:51:42 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 8 Nov 2022 13:51:42 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v5] In-Reply-To: <lkbUp9kcZikb5T5OBU-ey8NbWeg15hpjP2KREyxZprA=.db7acfbd-3cff-4053-830b-74e3f594b5f8@github.com> References: <lkbUp9kcZikb5T5OBU-ey8NbWeg15hpjP2KREyxZprA=.db7acfbd-3cff-4053-830b-74e3f594b5f8@github.com> Message-ID: <87Sl2YDh3Saxta3kbEJb96PgtnfqcnkQxZmVLmZ_34A=.ed172361-8a07-4f0f-bb43-fe31a163a155@github.com> > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - entryVFrame, externalVFrame: print_on refactoring - javaVFrame: print_on refactoring - StackValueCollection: Refactor into print_on ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10645/files - new: https://git.openjdk.org/jdk/pull/10645/files/b375c9f1..70c0e715 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=03-04 Stats: 63 lines in 4 files changed: 25 ins; 3 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/10645.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10645/head:pull/10645 PR: https://git.openjdk.org/jdk/pull/10645 From sjohanss at openjdk.org Tue Nov 8 14:03:26 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 8 Nov 2022 14:03:26 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <vhGem0bMMzPgrZwb-dnrsB-9csBm7dY-odIVHJx3UyE=.d5f84f0a-b4fc-42ff-9fc4-8b1cc36f5f03@github.com> On Fri, 4 Nov 2022 14:46:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas Looks good. One comment below that you can decide if you address or not. src/hotspot/share/classfile/classLoaderData.hpp line 210: > 208: _claim_other = 4, > 209: _claim_strong_g1_fullgc_mark = 8, > 210: _claim_strong_g1_fullgc_adjust = 16 I do agree that the naming is not optimal and I wonder if using more generic names would be better for now. Like just `_claim_mark` and `_claim_adjust` and look at making them GC-specific should be a different change that addresses all GC specific values. I also think `_claim_other` could continue to be last. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/10989 From alanb at openjdk.org Tue Nov 8 14:34:14 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 14:34:14 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <03q_uZD-g1_ZiTYXVINTiCoe9tc26XUyvkUCmodHogM=.6c45ba81-4daf-4132-a51b-3613ad587030@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file I checked the JLPIS agent (= j.l.instrument implementation) with a custom system class loader that doesn't define a appendToClassPathForInstrumentation method and it isn't handled correctly. The right behavior should be for AddToSystemClassLoaderSearch to throw UOE but instead it aborts the VM. So I think we'll create a JBS issue for that. If the custom system class loader does define appendToClassPathForInstrumentation then it will be called, it's just not possible for it to delegate it to the application class loader's appendToClassPathForInstrumentation. It's very possible this will lead to some anomalies as the defining class loader for the classes on the original class path will be the app class loader but any classes added by the agent at runtime (after startup) will likely be the custom system class loader. This is Java agents as opposed to JVMTI agents but it does suggest that the combination of custom system class loader and agents augmenting the class path at runtime is not well tested. So ObjectLocker or not, it is unlikely to be detected by any tests. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Tue Nov 8 14:48:03 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 14:48:03 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v2] In-Reply-To: <KdrCQ8cwAapDVaO3-qFNazmJUCxsSSMOh6S4F4f17jo=.be631c28-01b0-4f50-83d4-325729d44042@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <IwsuhgkAcoCh4lrxNHrzRQUqC1-uwHX94txC8IzVpYA=.3cf4c50f-f259-41f0-b29b-2412a469d376@github.com> <KdrCQ8cwAapDVaO3-qFNazmJUCxsSSMOh6S4F4f17jo=.be631c28-01b0-4f50-83d4-325729d44042@github.com> Message-ID: <E5TrmHiFVi0A6jQvpdH0WZSCzNUPq5jV7lCon0eJNF4=.2a3ffd91-61ab-4d7a-aded-5bf147bbd743@github.com> On Tue, 8 Nov 2022 12:47:29 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Handle non OOM exceptions and rename subgroupsAsArray. > > I wonder if the intermediate resource array is needed now. With the change, subgroupsAsArray returns a Java array, JvmtiEnvBase::get_subgroups creates a resource array with a handle to each of the thread group oops, then JvmtiEnvBase::new_jthreadArray creates a new local ref for each group. @AlanBateman you're right the extra copy of the thread group array is wasteful and since I've changed this code already, I cleaned up the extra copying. Reran jvmti tests. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 8 14:47:59 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 14:47:59 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v4] In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <u494IZor3jcyrW8bwjGKGby5vG49d0Oh1YlO50Wl0L4=.41b79c8c-1ad9-4474-871c-2411abf80e60@github.com> > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Clean up extra copy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11033/files - new: https://git.openjdk.org/jdk/pull/11033/files/120bee6b..89955d30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=02-03 Stats: 40 lines in 3 files changed: 10 ins; 20 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From yadongwang at openjdk.org Tue Nov 8 14:48:16 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 8 Nov 2022 14:48:16 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> Message-ID: <ex_4VH3mqzj5oOy-DBjy2GaKZmtCoRq3h5xHmQ0Fe1c=.e49b3098-5205-4f5f-9c3d-e4ad7cc8375d@github.com> On Tue, 8 Nov 2022 09:00:38 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> Please see the JBS issue for more crash details. >> >> To reproduce using a cross-compiled build: >> >> # dump one cds-nocoops.jsa >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version >> >> # reproduce >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ >> -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens >> >> >> `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. >> >> To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. >> >> This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. >> >> Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix as to comments lgtm @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/11010 From coleenp at openjdk.org Tue Nov 8 14:55:17 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 14:55:17 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Forgot a null check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11033/files - new: https://git.openjdk.org/jdk/pull/11033/files/89955d30..5bfec2e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From luhenry at openjdk.org Tue Nov 8 16:18:22 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Nov 2022 16:18:22 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <J4kgVWIrCihh44u0d9KOAK9UjgP5RFfoxKIXPr1aMUI=.112d58bd-e9e4-4396-b204-94db972c364c@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <1yV4wylyTjD5-MuT7o_IylkxgU_O3xryd3cRlULgdbY=.9b73b7e9-5de1-47dc-ba6f-9dc30b400a9e@github.com> <J4kgVWIrCihh44u0d9KOAK9UjgP5RFfoxKIXPr1aMUI=.112d58bd-e9e4-4396-b204-94db972c364c@github.com> Message-ID: <bbhND5V2Qtab9H3aTFw7nVG_3Pf50YY_moqhN8bcQRA=.13f9e2c4-051d-4935-9114-cad18f1c9c22@github.com> On Tue, 1 Nov 2022 06:56:13 GMT, Fei Yang <fyang at openjdk.org> wrote: >> @RealFYang let me know what you think. Thanks! > > @luhenry : Sorry for late reply. I am looking this this now. @RealFYang @yadongw could you please review+sponsor if it looks all good to you? Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10884 From jvernee at openjdk.org Tue Nov 8 16:20:51 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Nov 2022 16:20:51 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v9] In-Reply-To: <B1_b3U4qE-cnF0jtLHfpn2k5NWFj1jadMO6K3VTMJFk=.03e3366d-0920-4120-8878-92199c61950e@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <B1_b3U4qE-cnF0jtLHfpn2k5NWFj1jadMO6K3VTMJFk=.03e3366d-0920-4120-8878-92199c61950e@github.com> Message-ID: <c1qV9PAvdc0F_hMKotT9UDO7x6Qea9bKM9XwNy887fo=.c54d69df-8432-45fe-bbd1-5d9729c3bb32@github.com> On Tue, 8 Nov 2022 13:28:58 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Rework package-level javadoc for restricted methods Did a full review. Only some minor comments. Also, please add attribution with `/contributor add @<user>` for the people that contributed. (I think you have to add yourself as well, if you do that). src/java.base/share/classes/java/lang/foreign/GroupLayout.java line 57: > 55: > 56: @Override > 57: GroupLayout withName(String name); It looks like this method, and `withBitAlignment` below have no javadoc? Does this need `inheritDoc`? src/java.base/share/classes/java/lang/foreign/Linker.java line 75: > 73: * <ul> > 74: * <li>if {@code L} is a {@link ValueLayout} with carrier {@code E} then {@code C = E}; or</li> > 75: * <li>if {@code L} is a {@link GroupLayout}, then {@code C} is set to {@code MemorySegment.class}</li> Now that we have `FunctionDescriptor::toMethodType` I think this paragraph could be simplified by just referencing that. src/java.base/share/classes/java/lang/foreign/Linker.java line 101: > 99: * <ul> > 100: * <li>if {@code L} is a {@link ValueLayout} with carrier {@code E} then {@code C = E}; or</li> > 101: * <li>if {@code L} is a {@link GroupLayout}, then {@code C} is set to {@code MemorySegment.class}</li> Same here. This is covered by the doc of `FunctionDescriptor::toMethodType`. src/java.base/share/classes/java/lang/foreign/Linker.java line 119: > 117: * <li>The memory session of {@code A} is {@linkplain MemorySession#isAlive() alive}. Otherwise, the invocation throws > 118: * {@link IllegalStateException};</li> > 119: * <li>The invocation occurs in same thread as the one {@linkplain MemorySession#isOwnedBy(Thread) owning} the memory session of {@code R}, Suggestion: * <li>The invocation occurs in same thread as the one {@linkplain MemorySession#isOwnedBy(Thread) owning} the memory session of {@code A}, src/java.base/share/classes/java/lang/foreign/Linker.java line 121: > 119: * <li>The invocation occurs in same thread as the one {@linkplain MemorySession#isOwnedBy(Thread) owning} the memory session of {@code R}, > 120: * if said session is confined. Otherwise, the invocation throws {@link WrongThreadException}; and</li> > 121: * <li>The memory session of {@code R} is <em>kept alive</em> (and cannot be closed) during the invocation.</li> Suggestion: * <li>The memory session of {@code A} is <em>kept alive</em> (and cannot be closed) during the invocation.</li> src/java.base/share/classes/java/lang/foreign/StructLayout.java line 43: > 41: > 42: @Override > 43: StructLayout withName(String name); Missing `inheritDoc`? src/java.base/share/classes/java/lang/foreign/UnionLayout.java line 43: > 41: > 42: @Override > 43: UnionLayout withName(String name); Missing `inheritDoc`? src/java.base/share/classes/java/lang/foreign/VaList.java line 44: > 42: * Helper class to create and manipulate variable argument lists, similar in functionality to a C {@code va_list}. > 43: * <p> > 44: * A variable argument list segment can be created using the {@link #make(Consumer, MemorySession)} factory, as follows: Suggestion: * A variable argument list can be created using the {@link #make(Consumer, MemorySession)} factory, as follows: src/java.base/share/classes/java/lang/foreign/VaList.java line 50: > 48: * .addVarg(C_DOUBLE, 3.8d)); > 49: *} > 50: * Once created, clients can obtain the platform-dependent {@linkplain #segment() memory segment} associated a variable Suggestion: * Once created, clients can obtain the platform-dependent {@linkplain #segment() memory segment} associated with a variable src/java.base/share/classes/java/lang/foreign/ValueLayout.java line 134: > 132: > 133: @Override > 134: ValueLayout withName(String name); Missing `inheritDoc` here as well, and on other withers below. src/java.base/share/classes/java/lang/foreign/ValueLayout.java line 356: > 354: * Equivalent to the following code: > 355: * {@snippet lang=java : > 356: * ADDRESS.of(ByteOrder.nativeOrder()) This code doesn't look correct. It also looks like OfAddress layouts have their alignment set to the address size already, so the alignment adjustment here seems unnecessary as well. src/java.base/share/classes/java/lang/foreign/ValueLayout.java line 367: > 365: * Equivalent to the following code: > 366: * {@snippet lang=java : > 367: * JAVA_BYTE.of(ByteOrder.nativeOrder()).withBitAlignment(8); Same here (and for the other snippets below), `OfByte` doesn't have an `of` method. This looks maybe like a regex-replace error. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Tue Nov 8 16:20:53 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Nov 2022 16:20:53 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v6] In-Reply-To: <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <6KFOS0uVml9eRkWm9inRT0um8oEV_kUw3UZPKT_p67Q=.f330d3e5-5579-4361-8963-763928018e9a@github.com> Message-ID: <mnL8CNN1hqmOuLXzz5ZZo-p-bPv__ghv-cedWvLKMTU=.d703a0d4-5649-4f2c-b179-c490d926d7a1@github.com> On Mon, 7 Nov 2022 15:00:02 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make memory session a pure lifetime abstraction src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 157: > 155: public long mismatch(MemorySegment other) { > 156: Objects.requireNonNull(other); > 157: return MemorySegment.mismatch(this, 0, byteSize(), other, 0, other.byteSize()); Bit strange to see this calling back up to a method in the interface. Maybe this should just be a `default` method in `MemorySegment`? src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 163: > 161: * Mismatch over long lengths. > 162: */ > 163: public static long vectorizedMismatchLargeForBytes(MemorySessionImpl aSession, MemorySessionImpl bSession, Does this need to be `public`? Only seems to be referenced below. src/java.base/share/classes/jdk/internal/foreign/MemorySessionImpl.java line 179: > 177: @ForceInline > 178: public static MemorySessionImpl toSessionImpl(MemorySession session) { > 179: return (MemorySessionImpl)session; Maybe calls to this method should just be replaced with a cast. src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/linux/LinuxAArch64VaList.java line 136: > 134: long ptr = UNSAFE.allocateMemory(LAYOUT.byteSize()); > 135: MemorySegment ms = MemorySegment.ofAddress(ptr, LAYOUT.byteSize(), > 136: MemorySession.implicit(), () -> UNSAFE.freeMemory(ptr)); pre-existing, but it seems like this could just use `MemorySegment.allocateNative(LAYOUT, MemorySession.implicit())`? Suggestion: MemorySegment base = MemorySegment.allocateNative(LAYOUT, MemorySession.implicit()); (and remove the dependency on `Unsafe` altogether) src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/linux/LinuxAArch64VaList.java line 142: > 140: VH_gr_offs.set(ms, 0); > 141: VH_vr_offs.set(ms, 0); > 142: return ms; I suggest doing Suggestion: return ms.asSlice(0, 0); To create an opaque segment, just like the `segment()` accessor does. Or maybe update the implementation of `SharedUtils.emptyVaList` to do this. src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/linux/LinuxAArch64VaList.java line 408: > 406: @Override > 407: public MemorySegment segment() { > 408: return segment.asSlice(0, 0); A comment about what is happening here would be nice. (making sure the returned segment is opaque?) src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/macos/MacOsAArch64VaList.java line 176: > 174: @Override > 175: public MemorySegment segment() { > 176: return segment.asSlice(0, 0); Same here. src/java.base/share/classes/jdk/internal/foreign/abi/x64/sysv/SysVVaList.java line 145: > 143: long ptr = U.allocateMemory(LAYOUT.byteSize()); > 144: MemorySegment base = MemorySegment.ofAddress(ptr, LAYOUT.byteSize(), > 145: MemorySession.implicit(), () -> U.freeMemory(ptr)); Same here: `MemorySegment base = MemorySegment.allocateNative(LAYOUT, MemorySession.implicit());` src/java.base/share/classes/jdk/internal/foreign/abi/x64/sysv/SysVVaList.java line 150: > 148: VH_overflow_arg_area.set(base, MemorySegment.NULL); > 149: VH_reg_save_area.set(base, MemorySegment.NULL); > 150: return base; Suggestion: return base.asSlice(0, 0); test/jdk/java/foreign/normalize/TestNormalize.java line 203: > 201: public static Object[][] bools() { > 202: return new Object[][]{ > 203: { 0b01, true }, // zero least significant bit, but non-zero first byte According to the comment this should actually be: Suggestion: { 0b10, true }, // zero least significant bit, but non-zero first byte Looks like I wrote this by mistake :( ------------- PR: https://git.openjdk.org/jdk/pull/10872 From eosterlund at openjdk.org Tue Nov 8 16:29:09 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 8 Nov 2022 16:29:09 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code Message-ID: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. ------------- Commit messages: - 8295214: Generational ZGC: Guard nmethods from cross modifying code Changes: https://git.openjdk.org/jdk/pull/11042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295214 Stats: 25 lines in 3 files changed: 20 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11042.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11042/head:pull/11042 PR: https://git.openjdk.org/jdk/pull/11042 From mcimadamore at openjdk.org Tue Nov 8 16:30:14 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 16:30:14 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v10] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <RdT4fePRQohvIzMlJ4rxv99nAE5ZKMqIbXFvfnBiCC8=.14e66173-56d0-4432-873f-4db2eb38335d@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Revamp javadoc of Arena/MemorySession Rename MemorySession::isOwnedBy to MemorySession::isAccessibleBy Add Arena::isOwnedBy - Javadoc tweaks in MemorySession/Arena ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/e2840232..fd367106 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=08-09 Stats: 311 lines in 10 files changed: 63 ins; 28 del; 220 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Tue Nov 8 16:36:34 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Nov 2022 16:36:34 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v10] In-Reply-To: <RdT4fePRQohvIzMlJ4rxv99nAE5ZKMqIbXFvfnBiCC8=.14e66173-56d0-4432-873f-4db2eb38335d@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <RdT4fePRQohvIzMlJ4rxv99nAE5ZKMqIbXFvfnBiCC8=.14e66173-56d0-4432-873f-4db2eb38335d@github.com> Message-ID: <MBVdyZl5z6iNj5IotZZhwL4KzXAMqG-tN2ijAsxaZM8=.78cbfbb9-198c-4767-88e3-f80e929e55c9@github.com> On Tue, 8 Nov 2022 16:30:14 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Revamp javadoc of Arena/MemorySession > Rename MemorySession::isOwnedBy to MemorySession::isAccessibleBy > Add Arena::isOwnedBy > - Javadoc tweaks in MemorySession/Arena src/java.base/share/classes/java/lang/foreign/Linker.java line 119: > 117: * <li>The memory session of {@code A} is {@linkplain MemorySession#isAlive() alive}. Otherwise, the invocation throws > 118: * {@link IllegalStateException};</li> > 119: * <li>The invocation occurs in same thread as the one {@linkplain MemorySession#isAccessibleBy(Thread) owning} the memory session of {@code R}, Suggestion: * <li>The invocation occurs in same thread as the one {@linkplain MemorySession#isAccessibleBy(Thread) owning} the memory session of {@code A}, ------------- PR: https://git.openjdk.org/jdk/pull/10872 From luhenry at openjdk.org Tue Nov 8 17:04:32 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Nov 2022 17:04:32 GMT Subject: RFR: 8296515: RISC-V: Optimized MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <9ufYlsyCCQ7XOMNUHp3pb2IfXaoWg6vVK321Ejo8KIw=.dcffe785-bf17-4690-a0ea-45c706118e22@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11036 From coleenp at openjdk.org Tue Nov 8 17:07:33 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 17:07:33 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <SerFBJVbiE59UN3LdVl8Bph-QB1ROvd0o_C-qwTEE2M=.a7e7a7aa-8824-4230-8a9b-7aad7fc06f3d@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file We call compute_java_loaders with upcall to ClassLoader.getSystemClassLoader() after initPhase3 but before JVMTI live phase: https://github.com/coleenp/jdk/blob/master/src/hotspot/share/runtime/threads.cpp#L731 which is when jvmti calls this: https://github.com/coleenp/jdk/blob/master/src/hotspot/share/prims/jvmtiEnv.cpp#L703 Can one create a class loader to override this function, and does one expect the JVM to synchronize it on the system class loader object (returned by getSystemClassLoader)? This feels too remote to try to explain in a CSR. I'll look to see if we have tests for this case. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From redestad at openjdk.org Tue Nov 8 17:20:39 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 8 Nov 2022 17:20:39 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v8] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <DDCl4mNCXcI7pt9pVEabmJ1jOgf-H8E2BSzOSkUbV2M=.0bc87588-c92f-4966-9432-1cc61c16fddd@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with five additional commits since the last revision: - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash Unroll + Reorder BBs - fixup! Handle size=0 and size=1 in Java - Handle size=0 and size=1 in Java - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups - Unroll loop for cnt1 < 32 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/6f49b5aa..a4d898a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=06-07 Stats: 216 lines in 7 files changed: 154 ins; 19 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From stuefe at openjdk.org Tue Nov 8 17:58:06 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Nov 2022 17:58:06 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled Message-ID: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ NativeCallStack(0) : NativeCallStack::empty_stack()) #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ NativeCallStack(1) : NativeCallStack::empty_stack()) and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: ... # Load tracking level cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> cb9a7e: 8b 03 mov (%rbx),%eax # detail (3) tracking? cb9a80: 83 f8 03 cmp $0x3,%eax # yes: go and collect callstack cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) ... # do the actual malloc: cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx ... cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. --------------------- The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: ... # load tracking level cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> cb990e: 8b 03 mov (%rbx),%eax # detail (3) tracking? cb9910: 83 f8 03 cmp $0x3,%eax # yes: go and collect callstack cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> # no: nothing more to do ... ... # do the actual malloc: cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> ... # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx .. cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. -------------- Results: When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. ------------- Commit messages: - JDK-8296437-CURRENT_PC_costly-even-if-NMT-off Changes: https://git.openjdk.org/jdk/pull/11040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296437 Stats: 20 lines in 5 files changed: 11 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/11040.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11040/head:pull/11040 PR: https://git.openjdk.org/jdk/pull/11040 From alanb at openjdk.org Tue Nov 8 18:01:27 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 8 Nov 2022 18:01:27 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <SerFBJVbiE59UN3LdVl8Bph-QB1ROvd0o_C-qwTEE2M=.a7e7a7aa-8824-4230-8a9b-7aad7fc06f3d@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> <SerFBJVbiE59UN3LdVl8Bph-QB1ROvd0o_C-qwTEE2M=.a7e7a7aa-8824-4230-8a9b-7aad7fc06f3d@github.com> Message-ID: <C6g_vih4gr1d3xmPNG4mPvxfS2MBqZFi5fUvZWIzDus=.7b90e137-917b-43cb-b414-b7bb0450491e@github.com> On Tue, 8 Nov 2022 17:03:05 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Can one create a class loader to override this function, and does one expect the JVM to synchronize it on the system class loader object (returned by getSystemClassLoader)? This feels too remote to try to explain in a CSR. > > I'll look to see if we have tests for this case. In most cases, system class loader == the application class loader. It should be very rare to run with -Djava.system.class.loader=... but we know of a few application servers that do use it. The behavior prior to JDK 9 was to always call the app class loader's appendToClassPathForInstrumentation. This changed in JDK 9 (see JDK-8160950) to call the custom system class loader's appendToClassPathForInstrumentation. The changes for JDK-8160950 added test/java/lang/instrument/CustomSystemLoader so running the test/java/lang/instrument tests would be good. I'd forgotten about that change when initially commenting here. There isn't anything in the API docs about synchronization on the class loader object, nothing should be depending on that but it wouldn't do any harm to note the change in a release note. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From jvernee at openjdk.org Tue Nov 8 18:18:08 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Nov 2022 18:18:08 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v11] In-Reply-To: <aX-NZP-MEh0lqwPI1IGvgSWWQwST66mnAevZVDMRWC4=.95e3dfe0-ec5a-4fe9-87bb-385eb7638ef8@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <aX-NZP-MEh0lqwPI1IGvgSWWQwST66mnAevZVDMRWC4=.95e3dfe0-ec5a-4fe9-87bb-385eb7638ef8@github.com> Message-ID: <u4J7jObzndMfPVOmUQD6y33jBy6g3tyf9OL3IVNheEY=.7595f8ce-8ffd-4174-afc8-2a73d94d8b80@github.com> On Tue, 8 Nov 2022 18:14:21 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Address review comments > - More javadoc tweaks Marked as reviewed by jvernee (Reviewer). src/java.base/share/classes/java/lang/foreign/MemorySession.java line 67: > 65: * cannot be easily determined. As shown in the example above, a memory session that is managed implicitly cannot end > 66: * if a program references to one or more segments associated with that session. This means that memory segments associated > 67: * with implicitly managed can be safely {@linkplain #isAccessibleBy(Thread) accessed} from multiple threads. Suggestion: * with implicitly managed sessions can be safely {@linkplain #isAccessibleBy(Thread) accessed} from multiple threads. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 8 18:18:07 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 18:18:07 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v11] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <aX-NZP-MEh0lqwPI1IGvgSWWQwST66mnAevZVDMRWC4=.95e3dfe0-ec5a-4fe9-87bb-385eb7638ef8@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Address review comments - More javadoc tweaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/fd367106..bb39bef3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=09-10 Stats: 190 lines in 21 files changed: 106 ins; 34 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 8 18:28:40 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 18:28:40 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v12] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <3_cNn7GNS1M_3ouTex59atRvhCZX3_-cTeDtlGsLfuk=.4699a537-73b1-427a-a42c-a81ba874d658@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/foreign/MemorySession.java Co-authored-by: Jorn Vernee <JornVernee at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/bb39bef3..fff83ca8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Tue Nov 8 18:47:52 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Nov 2022 18:47:52 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 Message-ID: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. This is split off from the main JEP integration to make reviewing easier. This includes the following patches: 1. https://github.com/openjdk/panama-foreign/pull/698 2. https://github.com/openjdk/panama-foreign/pull/699 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 4. https://github.com/openjdk/panama-foreign/pull/740 5. https://github.com/openjdk/panama-foreign/pull/746 6. https://github.com/openjdk/panama-foreign/pull/742 7. https://github.com/openjdk/panama-foreign/pull/743 Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. Please refer to the PR of each individual patch for a more detailed description. ------------- Depends on: https://git.openjdk.org/jdk/pull/10872 Commit messages: - Rename CAPTURED_STATE_MASK stub location to CAPTURED_STATE_BUFFER - fix TestCaptureCallState - add stubs - 8295353: Mark Register v24 as Volatile in Foreign Function & Memory C ABI Definition - 8294970: Add linker option for saving thread-locals that the VM can overwrite - 8275584: Incorrect stack spilling in CallArranger on MacOS/AArch64 - 8295265: Refactor handling of special values passed to stubs - VMStorage to record - 8275644: Replace VMReg in shuffling code with something more fine grained. - 8291913: Remove the TraceOptimizedUpcallStubs flag Changes: https://git.openjdk.org/jdk/pull/11019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296477 Stats: 2766 lines in 67 files changed: 1861 ins; 315 del; 590 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From rriggs at openjdk.org Tue Nov 8 19:22:10 2022 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 8 Nov 2022 19:22:10 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v8] In-Reply-To: <DDCl4mNCXcI7pt9pVEabmJ1jOgf-H8E2BSzOSkUbV2M=.0bc87588-c92f-4966-9432-1cc61c16fddd@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <DDCl4mNCXcI7pt9pVEabmJ1jOgf-H8E2BSzOSkUbV2M=.0bc87588-c92f-4966-9432-1cc61c16fddd@github.com> Message-ID: <VI_Cwb7qLL07SS7Yvw1TGib2p-fu0kRSuuTHQiqdnog=.c16fc8e6-3b08-4d17-9fae-349dc7bd7f1e@github.com> On Tue, 8 Nov 2022 17:20:39 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with five additional commits since the last revision: > > - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash > > Unroll + Reorder BBs > - fixup! Handle size=0 and size=1 in Java > - Handle size=0 and size=1 in Java > - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups > - Unroll loop for cnt1 < 32 src/java.base/share/classes/jdk/internal/module/ModuleHashes.java line 141: > 139: * > 140: * @param supplier supplies the module reader to access the module content > 141: * Revert, there are no other changes to ModuleHashes.java ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dlong at openjdk.org Tue Nov 8 20:26:20 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Nov 2022 20:26:20 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <L6iFC_qArN0CIu1SNeQ_t6Q9OfiTUXVMEbYRbD91n2c=.16065b74-f168-4fd0-93a1-34b47c4c7c00@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) It would be nice to know why register printing crashes, and if there is something we could do to make it more robust, like checking os::is_readable_pointer(). ------------- PR: https://git.openjdk.org/jdk/pull/11017 From duke at openjdk.org Tue Nov 8 21:41:58 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 8 Nov 2022 21:41:58 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v8] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <hvPoXsSiGzJ9BPfCYeojijb4C7mkJ4lVMyWIOpZM5Fc=.ca0faad6-dcd8-4cf1-9f47-3213a15252f7@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - make UsePolyIntrinsics option diagnostic - Merge remote-tracking branch 'origin/master' into avx512-poly - iwanowww review - Merge remote-tracking branch 'origin/master' into avx512-poly - address Jamil's review - invalidkeyexception and some review comments - extra whitespace character - assembler checks and test case fixes - Merge remote-tracking branch 'origin/master' into avx512-poly - Merge remote-tracking branch 'origin' into avx512-poly - ... and 5 more: https://git.openjdk.org/jdk/compare/0ee25de7...120247d5 ------------- Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=07 Stats: 1814 lines in 32 files changed: 1777 ins; 3 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 8 21:42:03 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 8 Nov 2022 21:42:03 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7] In-Reply-To: <yakJWzO8mmn5VJ0WpAEAwo1DHqpq2xkkD-oQol0gEhA=.dc776ad9-a666-4928-b59c-087af1052508@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <tLmwiB4fMB94Svqu8YbprTU5DZjwVSoEdKys-Nn5N0M=.32d50548-56bf-4d77-8df9-cc86cec38023@github.com> <f0Bv-td_2zmCQ7F8RjVS-O5SK36DolXJt83FD34zsuY=.d5336884-e8ae-4f01-bf23-47f6fec785c7@github.com> <yakJWzO8mmn5VJ0WpAEAwo1DHqpq2xkkD-oQol0gEhA=.dc776ad9-a666-4928-b59c-087af1052508@github.com> Message-ID: <Cr5sMZZzFMDx4R0I_zUtQHeKVdRx0ctdVWgl9VJQ6xA=.5ae8e681-acc2-4044-b29a-4ff70ebf98dd@github.com> On Fri, 4 Nov 2022 17:25:16 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/hotspot/share/opto/library_call.cpp line 7036: >> >>> 7034: assert(r_start, "r array is NULL"); >>> 7035: >>> 7036: Node* call = make_runtime_call(RC_LEAF, >> >> Can we safely change this to `RC_LEAF | RC_NO_FP`? For the ChaCha20 block intrinsic I'm working on I've been using that parameter because I'm not touching the FP registers and that looks to be the case here (though your intrinsic is a lot more complicated than mine so I may have missed something). I believe the GHASH and AES library call routines also call `make_runtime_call()` in this way. > > Makes sense to me, will put it in and re-test (no fp registers anywhere in the intrinsic). Thanks! done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 8 21:42:08 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 8 Nov 2022 21:42:08 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> Message-ID: <xOU5sYvutNsfaUK4Elp2mPQ1bXTLb0jksx97Z4zfRBE=.1c46e399-69d7-4ed4-9635-41129c0f379d@github.com> On Tue, 1 Nov 2022 23:21:57 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> invalidkeyexception and some review comments > > src/hotspot/share/runtime/globals.hpp line 241: > >> 239: "Use intrinsics for java.util.Base64") \ >> 240: \ >> 241: product(bool, UsePolyIntrinsics, false, \ > > I'm not a fan of introducing new flags for individual intrinsics (there's already `-XX:DisableIntrinsic=_name` specifically for that), but since we already have many, shouldn't it be declared as a diagnostic flag, at least? Started removing the option, but its quite convenient to have the boolean global, so just made the option diagnostic. "done" ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 8 22:03:20 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 8 Nov 2022 22:03:20 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> Message-ID: <hFUaLY8cUCfXRbJCKtDcefgrwZvl2X53q1QPVUEosqc=.7980aa45-d193-49cc-b3f1-15a8eb45576e@github.com> On Tue, 1 Nov 2022 23:49:17 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: >> >>> 2000: } >>> 2001: >>> 2002: address StubGenerator::generate_poly1305_masksCP() { >> >> I suggest to turn it into a C++ literal constant and move the declaration next to `poly1305_process_blocks_avx512` where they are used. As an example, here's how it is handled in GHASH stubs: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_ghash.cpp#L35 >> >> That would allow to avoid to simplify the code a bit (no need in `StubRoutines::x86::_poly1305_mask_addr`/`poly1305_mask_addr()` and no need to generate the constants during VM startup). >> >> You could split it into 3 constants, but then using a single base register (`polyCP`) won't work anymore. >> Thinking more about it, I'm not sure why you can't just do the split and use address literals instead to access individual constants (and repurpose `r13` to be used as a scratch register when RIP-relative addressing mode doesn't work). > > The case of AES stubs may be even a better fit here: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp#L47 > > It doesn't use/introduce any shared constants, so declaring a constant and a local accessor (to save on pointer to address casts at use sites) is enough. @iwanowww moved to StubGenerator as suggested.. moving functions to the stubGenerator_x86_64.hpp header doesn't seem 'clean' but I think that's the pattern. The constant pool.. stared at it for a while and ended up keeping it mostly intact (its now a static function, not a member function; header bit cleaner; followed AES pattern). Did not split it up into individual constants. The main 'problem' is that `Address` and `ExternalAddress` are not compatible. Most instructions do not take `AddressLiteral`, so can't use `ExternalAddress` to refer to those constants. (If I did get the instructions I use to take `AddressLiteral`, I think we would end up with more `lea(rscratch)`s generated; but that's more of a silver-lining) I also thought of loading constants at run-time, (load and replicate for vector.. what I mentioned in my comment above) but that seems needlessly complicated in hindsight.. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From mcimadamore at openjdk.org Tue Nov 8 22:07:07 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 22:07:07 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v13] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <W6HEyQt73yjVqG0oFtAcGdHMvsW4Acfm-EQFZxaHYHA=.8d12840d-1956-4e93-893a-25256efe3d5b@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: More javadoc fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/fff83ca8..9be0c97b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=11-12 Stats: 10 lines in 3 files changed: 0 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From coleenp at openjdk.org Tue Nov 8 22:10:09 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 8 Nov 2022 22:10:09 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <VD0zW_j55V_Tyk9YFRxJfiMR-a3HnChbPJzt7iE-cvo=.0c42b9a6-beda-405b-9ed5-af7f8c3af6d1@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file I could add a release note to say something like: When running a java application with the options "-javaagent:myagent.jar -Djava.system.classloader=MyClassLoader" the myagent.jar is added to the custom system class loader rather then the application class loader. The JVM no longer synchronizes on the custom class loader while calling appendToClassPathForInstrumentation. The appendToClassPathForInstrumentation method in the custom class loader must synchronize appending to the class path. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From mcimadamore at openjdk.org Tue Nov 8 22:12:46 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 8 Nov 2022 22:12:46 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v14] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/9be0c97b..df29e6a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From duke at openjdk.org Tue Nov 8 23:21:58 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 8 Nov 2022 23:21:58 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: fix 32-bit build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/120247d5..da560452 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=07-08 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From dholmes at openjdk.org Tue Nov 8 23:38:17 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Nov 2022 23:38:17 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> Message-ID: <y02Dl87_g-dbnGjXtN0DOD3fgZy7PTPPLRVENr6man4=.ceaaf2a3-6b7c-473a-8a99-6117ed1b6ae9@github.com> On Tue, 8 Nov 2022 14:55:17 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Forgot a null check. src/hotspot/share/prims/jvmtiEnvBase.cpp line 540: > 538: > 539: jthread * > 540: JvmtiEnvBase::new_jthreadArray(int length, Handle *handles) { Shouldn't this method need to cast the return value to `jthread*`? And potentially shouldn't all the jobject's now be jthread's? ------------- PR: https://git.openjdk.org/jdk/pull/11033 From dholmes at openjdk.org Tue Nov 8 23:47:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Nov 2022 23:47:38 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <VD0zW_j55V_Tyk9YFRxJfiMR-a3HnChbPJzt7iE-cvo=.0c42b9a6-beda-405b-9ed5-af7f8c3af6d1@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> <VD0zW_j55V_Tyk9YFRxJfiMR-a3HnChbPJzt7iE-cvo=.0c42b9a6-beda-405b-9ed5-af7f8c3af6d1@github.com> Message-ID: <J0bLxJBGH9VGh3qQDjg-GybaEo41W5rV_BYnJplr3G4=.d4a87ab5-0a4e-4785-b9b3-51960c06c969@github.com> On Tue, 8 Nov 2022 22:08:06 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > in the custom class loader must synchronize appending to the class path I would say: "in the custom class loader must append to the class path in a thread-safe manner." ------------- PR: https://git.openjdk.org/jdk/pull/11023 From redestad at openjdk.org Tue Nov 8 23:48:22 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 8 Nov 2022 23:48:22 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v9] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <gI7FMBYJotjnzPUDzLHCXROrrrxwnBRc1rJ5odyegk4=.bac9cc2a-f6a4-4541-9e28-956675052115@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 55 commits: - Revert accidental ModuleHashes change - Merge branch 'master' into 8282664-polyhash - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash Unroll + Reorder BBs - fixup! Handle size=0 and size=1 in Java - Handle size=0 and size=1 in Java - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups - Unroll loop for cnt1 < 32 - Merge pull request #1 from luhenry/dev/cl4es/8282664-polyhash Switch to forward approach for vectorization - Fix vector loop - fix indexing - ... and 45 more: https://git.openjdk.org/jdk/compare/dd5d4df5...853a7575 ------------- Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=08 Stats: 1186 lines in 33 files changed: 1130 ins; 9 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Nov 8 23:48:24 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 8 Nov 2022 23:48:24 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v8] In-Reply-To: <VI_Cwb7qLL07SS7Yvw1TGib2p-fu0kRSuuTHQiqdnog=.c16fc8e6-3b08-4d17-9fae-349dc7bd7f1e@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <DDCl4mNCXcI7pt9pVEabmJ1jOgf-H8E2BSzOSkUbV2M=.0bc87588-c92f-4966-9432-1cc61c16fddd@github.com> <VI_Cwb7qLL07SS7Yvw1TGib2p-fu0kRSuuTHQiqdnog=.c16fc8e6-3b08-4d17-9fae-349dc7bd7f1e@github.com> Message-ID: <XfYk2SsFzoku8eU6sjGrvLIpOqMM3F-02RN2BX_C2Bo=.5ea0485c-c28c-4522-a678-5dc1d0f07e41@github.com> On Tue, 8 Nov 2022 19:14:25 GMT, Roger Riggs <rriggs at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with five additional commits since the last revision: >> >> - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash >> >> Unroll + Reorder BBs >> - fixup! Handle size=0 and size=1 in Java >> - Handle size=0 and size=1 in Java >> - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups >> - Unroll loop for cnt1 < 32 > > src/java.base/share/classes/jdk/internal/module/ModuleHashes.java line 141: > >> 139: * >> 140: * @param supplier supplies the module reader to access the module content >> 141: * > > Revert, there are no other changes to ModuleHashes.java Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Nov 8 23:57:02 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 8 Nov 2022 23:57:02 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> Message-ID: <e5yc8r8juyKkyBKZMxTbhHBigz4_0nC-fwHHSOXxpNo=.f4c2e10a-0c43-4679-85bd-77833cade7f1@github.com> On Tue, 25 Oct 2022 16:03:28 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > I did a quick write up explaining the approach at https://gist.github.com/luhenry/2fc408be6f906ef79aaf4115525b9d0c. Also, you can find details in @richardstartin's [blog post](https://richardstartin.github.io/posts/vectorised-polynomial-hash-codes) Most optimizations for small arrays are now back - thanks @luhenry! - I'll do a pass tomorrow and see if there's something we can simplify or enhance before calling it done. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From vlivanov at openjdk.org Wed Nov 9 00:29:32 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Nov 2022 00:29:32 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> Message-ID: <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix 32-bit build src/hotspot/cpu/x86/macroAssembler_x86.hpp line 970: > 968: > 969: void addmq(int disp, Register r1, Register r2); > 970: Leftover formatting changes. src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 95: > 93: > 94: // OFFSET 64: mask_44 > 95: 0xfffffffffff, 0xfffffffffff, Please, keep leading zeroes explicit in the constants. src/hotspot/cpu/x86/stubRoutines_x86.cpp line 2: > 1: /* > 2: * Copyright (c) 2013, 2022, Oracle and/or its affiliates. All rights reserved. No changes in the file anymore. src/hotspot/share/opto/library_call.cpp line 7014: > 7012: const TypeKlassPtr* rklass = TypeKlassPtr::make(instklass_ImmutableElement); > 7013: const TypeOopPtr* rtype = rklass->as_instance_type()->cast_to_ptr_type(TypePtr::NotNull); > 7014: Node* rObj = new CheckCastPPNode(control(), rFace, rtype); FTR it's an unsafe cast since it doesn't involve a runtime check from `IntegerModuloP` to `ImmutableElement`. Please, lift as much checks into Java wrapper as possible. src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > 173: > 174: int blockMultipleLength = len & (~(BLOCK_LENGTH-1)); > 175: Objects.checkFromIndexSize(offset, blockMultipleLength, input.length); I suggest to move the checks into `processMultipleBlocks`, introduce new static helper method specifically for the intrinsic part, and lift more logic (e.g., field loads) from the intrinsic into Java code. As an additional step, you can switch to double-register addressing mode (base + offset) for input data (`input`, `alimbs`, `rlimbs`) and simplify the intrinsic part even more (will involve a switch from `array_element_address` to `make_unsafe_address`). ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Wed Nov 9 00:42:32 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Nov 2022 00:42:32 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <hFUaLY8cUCfXRbJCKtDcefgrwZvl2X53q1QPVUEosqc=.7980aa45-d193-49cc-b3f1-15a8eb45576e@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> <hFUaLY8cUCfXRbJCKtDcefgrwZvl2X53q1QPVUEosqc=.7980aa45-d193-49cc-b3f1-15a8eb45576e@github.com> Message-ID: <asa1G1rY6oVsgFHuXITPmNXCuryU8b5vJRNj8RMnZng=.f5f4e431-48c2-4896-998a-fb418992b581@github.com> On Tue, 8 Nov 2022 22:01:19 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: > Did not split it up into individual constants. The main 'problem' is that Address and ExternalAddress are not compatible. There's a reason for that and it's because RIP-relative addressing doesn't always work, so additional register may be needed. > Most instructions do not take AddressLiteral, so can't use ExternalAddress to refer to those constants. I counted 4 instructions accessing the constants (`evpandq`, `andq`, `evporq`, and `vpternlogq`) in your patch. `macroAssembler_x86.hpp` is the place for `AddressLiteral`-related overloads (there are already numerous cases present) and it's trivial to add new ones. > (If I did get the instructions I use to take AddressLiteral, I think we would end up with more lea(rscratch)s generated; but that's more of a silver-lining) It depends on memory layout. If constants end up placed close enough in the address space, there'll be no additional instructions generated. Anyway, it doesn't look like something important from throughput perspective. Overall, I find it clearer when the code refers to individual constants through `AddressLiteral`s, but I'm also fine with it as it is now. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dholmes at openjdk.org Wed Nov 9 01:58:21 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 9 Nov 2022 01:58:21 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> On Tue, 8 Nov 2022 16:19:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. What are the performance implications here, if any? I don't like that everyone pays for this when it is only needed for genZGC. Thanks. src/hotspot/share/runtime/safepointMechanism.cpp line 108: > 106: if (prev_poll_word != poll_word || > 107: prev_poll_word == _poll_word_armed_value) { > 108: // After updating the poll value, we allow entering new nmethods I'm a little confused about the positioning here. The comment says "after updating the poll value", but we haven't updated yet (happens below) so don't we need the fence after that point? src/hotspot/share/runtime/safepointMechanism.cpp line 144: > 142: > 143: update_poll_values(thread); > 144: OrderAccess::cross_modify_fence(); Has this simply been moved to cover more paths? ------------- PR: https://git.openjdk.org/jdk/pull/11042 From duke at openjdk.org Wed Nov 9 02:22:01 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 02:22:01 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <asa1G1rY6oVsgFHuXITPmNXCuryU8b5vJRNj8RMnZng=.f5f4e431-48c2-4896-998a-fb418992b581@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> <hFUaLY8cUCfXRbJCKtDcefgrwZvl2X53q1QPVUEosqc=.7980aa45-d193-49cc-b3f1-15a8eb45576e@github.com> <asa1G1rY6oVsgFHuXITPmNXCuryU8b5vJRNj8RMnZng=.f5f4e431-48c2-4896-998a-fb418992b581@github.com> Message-ID: <EsOqIOc_ALMLS4KMPlFF7-_NDw8AkVUlla194C8zERY=.699e46f8-7c6c-4f4e-95d3-04ad0e3d13e2@github.com> On Wed, 9 Nov 2022 00:38:45 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> @iwanowww moved to StubGenerator as suggested.. moving functions to the stubGenerator_x86_64.hpp header doesn't seem 'clean' but I think that's the pattern. >> >> The constant pool.. stared at it for a while and ended up keeping it mostly intact (its now a static function, not a member function; header bit cleaner; followed AES pattern). >> >> Did not split it up into individual constants. The main 'problem' is that `Address` and `ExternalAddress` are not compatible. Most instructions do not take `AddressLiteral`, so can't use `ExternalAddress` to refer to those constants. (If I did get the instructions I use to take `AddressLiteral`, I think we would end up with more `lea(rscratch)`s generated; but that's more of a silver-lining) >> >> I also thought of loading constants at run-time, (load and replicate for vector.. what I mentioned in my comment above) but that seems needlessly complicated in hindsight.. > >> Did not split it up into individual constants. The main 'problem' is that Address and ExternalAddress are not compatible. > > There's a reason for that and it's because RIP-relative addressing doesn't always work, so additional register may be needed. > >> Most instructions do not take AddressLiteral, so can't use ExternalAddress to refer to those constants. > > I counted 4 instructions accessing the constants (`evpandq`, `andq`, `evporq`, and `vpternlogq`) in your patch. > > `macroAssembler_x86.hpp` is the place for `AddressLiteral`-related overloads (there are already numerous cases present) and it's trivial to add new ones. > >> (If I did get the instructions I use to take AddressLiteral, I think we would end up with more lea(rscratch)s generated; but that's more of a silver-lining) > > It depends on memory layout. If constants end up placed close enough in the address space, there'll be no additional instructions generated. > > Anyway, it doesn't look like something important from throughput perspective. Overall, I find it clearer when the code refers to individual constants through `AddressLiteral`s, but I'm also fine with it as it is now. Makes sense to me, that would indeed be cleaner, will add a couple more overloads. (Still getting used to what is 'clean' in this code base). ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 9 02:22:04 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 02:22:04 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> Message-ID: <9eURte9F6DahXze39MUQEegF0nNqZRfXh-au-mRNhpA=.b145ca11-9d61-4976-aece-4da91aa2f719@github.com> On Wed, 9 Nov 2022 00:10:48 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/share/opto/library_call.cpp line 7014: > >> 7012: const TypeKlassPtr* rklass = TypeKlassPtr::make(instklass_ImmutableElement); >> 7013: const TypeOopPtr* rtype = rklass->as_instance_type()->cast_to_ptr_type(TypePtr::NotNull); >> 7014: Node* rObj = new CheckCastPPNode(control(), rFace, rtype); > > FTR it's an unsafe cast since it doesn't involve a runtime check from `IntegerModuloP` to `ImmutableElement`. Please, lift as much checks into Java wrapper as possible. Ah, yeah.. I quite suspected I didn't emulate all the bytecodes needed. Thanks for the info. So this is a bit of a quandary.. I had done the intrinsic more in Java before, but it slows down the non-intrinsic path (This was the discussion we were having with Jamil). In Java, the limbs are not 'accessible' per-se.. They are in a separate package hidden behind and interface.. and in a nested non-static class inside an abstract class... Its quite well designed. Its just makes what I want to do break most encapsulations. There is a method (`asByteArray`) to extract the limbs that I was previously using, but that slows down non-intrinsic path (to be honest it slows down the intrinsic path too, but the assembler makes some of that back.. You can see that from the numbers I posted for Jamil, vs original in the PR header; originally I got 18x, and now with accessing limbs directly its 19x). If I had some way to check 'is intrinsic available', I could at least not slow down the current code. I would still have to break the encapsulation to do the checks/casts though. It all seems less-then-perfect. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dzhang at openjdk.org Wed Nov 9 02:37:34 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 9 Nov 2022 02:37:34 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <bUXhkCFs4XHQ0RFi4PtjYTtC7guDfzQY4kjqUybnm9s=.0f9f37d2-f1a0-499c-b141-79103e67b592@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation LGTM, thanks! ------------- Marked as reviewed by dzhang (Author). PR: https://git.openjdk.org/jdk/pull/11036 From yadongwang at openjdk.org Wed Nov 9 02:39:28 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Wed, 9 Nov 2022 02:39:28 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v7] In-Reply-To: <USqZ-hj48jmO86DWscqbw3IxSfhfVKO1V_TBJ7SJVqc=.d42b651d-9191-4134-a60f-01f7f3093c57@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <USqZ-hj48jmO86DWscqbw3IxSfhfVKO1V_TBJ7SJVqc=.d42b651d-9191-4134-a60f-01f7f3093c57@github.com> Message-ID: <c1nZeXh-1Miu710qHFX206uWEZMpjvfrLP7I25w2TKs=.77412776-afcb-416d-8a2e-b73be619236f@github.com> On Tue, 8 Nov 2022 07:34:35 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > support large disp values src/hotspot/cpu/riscv/riscv.ad line 5197: > 5195: > 5196: ins_encode %{ > 5197: if ((($mem$$disp) & 0x1f) == 0) { Does this branch only check the alignment, not the allowed range? ------------- PR: https://git.openjdk.org/jdk/pull/10884 From duke at openjdk.org Wed Nov 9 02:48:52 2022 From: duke at openjdk.org (David Schlosnagle) Date: Wed, 9 Nov 2022 02:48:52 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v9] In-Reply-To: <gI7FMBYJotjnzPUDzLHCXROrrrxwnBRc1rJ5odyegk4=.bac9cc2a-f6a4-4541-9e28-956675052115@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <gI7FMBYJotjnzPUDzLHCXROrrrxwnBRc1rJ5odyegk4=.bac9cc2a-f6a4-4541-9e28-956675052115@github.com> Message-ID: <s7h5Q4AbJGEUg8HH2ffWEbGH7aj4OwIDZ-b7C3HTfe8=.d446582a-8379-4aaa-8938-98de3c5cbb01@github.com> On Tue, 8 Nov 2022 23:48:22 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 55 commits: > > - Revert accidental ModuleHashes change > - Merge branch 'master' into 8282664-polyhash > - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash > > Unroll + Reorder BBs > - fixup! Handle size=0 and size=1 in Java > - Handle size=0 and size=1 in Java > - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups > - Unroll loop for cnt1 < 32 > - Merge pull request #1 from luhenry/dev/cl4es/8282664-polyhash > > Switch to forward approach for vectorization > - Fix vector loop > - fix indexing > - ... and 45 more: https://git.openjdk.org/jdk/compare/dd5d4df5...853a7575 Overall I am excited to see these changes land as this will be a nice boost for many strong heavy applications! src/hotspot/share/opto/matcher.cpp line 1707: > 1705: if (x >= _LAST_MACH_OPER) { > 1706: fprintf(stderr, "x = %d, _LAST_MACH_OPER = %d\n", x, _LAST_MACH_OPER); > 1707: fprintf(stderr, "dump n\n"); Should this be removed before merging? Suggestion: src/hotspot/share/opto/matcher.cpp line 1709: > 1707: fprintf(stderr, "dump n\n"); > 1708: n->dump(); > 1709: fprintf(stderr, "dump svec\n"); Remove? Suggestion: ------------- PR: https://git.openjdk.org/jdk/pull/10847 From yzhu at openjdk.org Wed Nov 9 03:20:04 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 9 Nov 2022 03:20:04 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Message-ID: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. $ java -XX:+PrintBytecodeHistogram --version|head -n 20 openjdk 20 2022-11-09 OpenJDK Runtime Environment (fastdebug build 20) OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) Histogram of 8101142 executed bytecodes: absolute relative code name ---------------------------------------------------------------------- 634592 7.83% dc fast_aload_0 471840 5.82% b6 invokevirtual 376275 4.64% 2b aload_1 358520 4.43% e0 fast_iload 332267 4.10% de fast_aaccess_0 270189 3.34% a7 goto 249831 3.08% 19 aload 223361 2.76% b9 invokeinterface 215666 2.66% 1c iload_2 194877 2.41% b8 invokestatic 192212 2.37% 2c aload_2 185826 2.29% 1b iload_1 $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 openjdk 20 2022-11-09 OpenJDK Runtime Environment (fastdebug build 20) OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) Histogram of 7627721 executed bytecode pairs: absolute relative codes 1st bytecode 2nd bytecode ---------------------------------------------------------------------- 102673 1.346% 84 a7 iinc goto 85429 1.120% dc 2b fast_aload_0 aload_1 84394 1.106% dc b6 fast_aload_0 invokevirtual 73131 0.959% b7 dc invokespecial fast_aload_0 64605 0.847% 2b b6 aload_1 invokevirtual 64086 0.840% dc b9 fast_aload_0 invokeinterface 63663 0.835% b6 dc invokevirtual fast_aload_0 59946 0.786% b6 de invokevirtual fast_aaccess_0 56631 0.742% 36 e0 istore fast_iload 51261 0.672% b9 de invokeinterface fast_aaccess_0 49556 0.650% 3a 19 astore aload 49106 0.644% a7 e0 goto fast_iload ------------- Commit messages: - Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Changes: https://git.openjdk.org/jdk/pull/11051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11051&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296301 Stats: 31 lines in 1 file changed: 22 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11051.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11051/head:pull/11051 PR: https://git.openjdk.org/jdk/pull/11051 From yzhu at openjdk.org Wed Nov 9 03:57:28 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 9 Nov 2022 03:57:28 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1717: > 1715: vredxor_vs(tmp, src2, tmp); > 1716: break; > 1717: case Op_MaxReductionV: We implemented Byte/Short MaxReductionV like this patch before. but there are 2 test cases that failed: jdk/jdk/incubator/vector/ShortMaxVectorTests.java jdk/jdk/incubator/vector/ByteMaxVectorTests.java The reason is that "make_reduction_input" will return INT_MAX for Byte & Short type. Have you met this problem? ------------- PR: https://git.openjdk.org/jdk/pull/11036 From stuefe at openjdk.org Wed Nov 9 05:12:55 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 9 Nov 2022 05:12:55 GMT Subject: RFR: JDK-8293402: hs-err file printer should reattempt stack trace printing if it fails In-Reply-To: <KKGFd2UpKLmbHnMnMeYIfdUBNb4ely3fqlxIHO6U-VI=.a772ea05-f210-4f27-b091-010ff94f04aa@github.com> References: <KKGFd2UpKLmbHnMnMeYIfdUBNb4ely3fqlxIHO6U-VI=.a772ea05-f210-4f27-b091-010ff94f04aa@github.com> Message-ID: <Rp3PfXUu7WB3uW3zDPIQBFtGPFACAy4uejtmjwXzzUY=.a9381947-12f8-40fd-9fe4-dfde0d8787a2@github.com> On Tue, 6 Sep 2022 08:05:23 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Hi, > > may I have reviews for this small improvement. > > The call stack may be the most important part of an hs-err file. We recently introduced printing of source information (https://bugs.openjdk.org/browse/JDK-8242181) which is nice but makes stack printing more vulnerable for two reasons: > - we may crash due to a programmer error (e.g. https://bugs.openjdk.org/browse/JDK-8293344) > - we may timeout on very slow machines/file systems when the source information are parsed from the debug info (we have seen those problems in the past) > > Therefore, VMError should retry stack printing without source information if the first attempt to print failed. > > Examples: > > Step timeouts while retrieving source info: > > > 24 --------------- T H R E A D --------------- > 25 > 26 Current thread (0x00007f70ac028bd0): JavaThread "main" [_thread_in_vm, id=565259, stack(0x00007f70b0587000,0x00007f70b0688000)] > 27 > 28 Stack: [0x00007f70b0587000,0x00007f70b0688000], sp=0x00007f70b0686cf0, free space=1023k > 29 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > 30 V [libjvm.so+0x1cd41c1] VMError::controlled_crash(int)+0x241 > 31 [timeout occurred during error reporting in step "printing native stack (with source info)"] after 30 s. > 32 > 33 Retrying call stack printing without source information... > 34 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > 35 V [libjvm.so+0x1cd41c1] VMError::controlled_crash(int)+0x241 > 36 V [libjvm.so+0x11cbe45] JNI_CreateJavaVM+0x5b5 > 37 C [libjli.so+0x4013] JavaMain+0x93 > 38 C [libjli.so+0x800d] ThreadJavaMain+0xd > 39 > > > > Step crashes while retrieving source info: > > > 24 --------------- T H R E A D --------------- > 25 > 26 Current thread (0x00007fc000028bd0): JavaThread "main" [_thread_in_vm, id=569254, stack(0x00007fc00573c000,0x00007fc00583d000)] > 27 > 28 Stack: [0x00007fc00573c000,0x00007fc00583d000], sp=0x00007fc00583bcf0, free space=1023k > 29 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > 30 V [libjvm.so+0x1cd41e1] VMError::controlled_crash(int)+0x241 > 31 [error occurred during error reporting (printing native stack (with source info)), id 0xb, SIGSEGV (0xb) at pc=0x00007fc006694d78] > 32 > 33 > 34 Retrying call stack printing without source information... > 35 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > 36 V [libjvm.so+0x1cd41e1] VMError::controlled_crash(int)+0x241 > 37 V [libjvm.so+0x11cbe65] JNI_CreateJavaVM+0x5b5 > 38 C [libjli.so+0x4013] JavaMain+0x93 > 39 C [libjli.so+0x800d] ThreadJavaMain+0xd > > > > Thanks, Thomas Hi Axel, > > Each time we crash out in VMError, we build up the stack. We never ever unwind that stack, eg. via longjmp, since that would introduce other errors (e.g. abandoned locks). There is a natural limit to how many recursive crashes we can handle since the stack is not endless. Each secondary crash increases the risk of running into guard pages and spoiling the game for follow-up STEPs. > > I have no experience with stack depth being the problem in crashes (with the caveat that I have only run with this patch for a few weeks), but have experienced cases where only having a hs_err file available and having the register print_location bailing out early, where missing printing the rest has been unfavourable. I don't doubt your experience, but for us the picture is different. We (SAP) have been working with this code for ~20yrs and hs-err files are one of our main tool of support. That is why we contributed so much hardening code, and why I am so interested. For us, torn hs-err files - by running either out of time, stack space, or against the recursive limit - are a frequent annoyance. Note that with stack space, you won't notice unless you explicitly investigate - the hs-err file is just either abridged or not there at all, but you won't get a clear message. I believe the problem you want to solve is real, but maybe we can find a better solution than introducing a mass of new STEPs. Since by doing that you effectively reduce the chance of later STEPs running successfully. Stack and time are valuable resources all STEPs share, even if the recursive limit is a bit arbitrary. And at least for stack space, I don't see a solution - each crash will build up stack, and there is nothing to prevent this. > > > Therefore we limit the number of allowed recursive crashes. This limit also serves a second purpose: if we crash that often, maybe we should just stop already and let the process die. > > Fair, tough I am curious how we want to decide this limit, and why ~60 is fine but ~90 would be too much (I am guessing that most steps have no, or a very small possibility of crashing). No clue. This is quite old code from Sun times, I believe. >Maybe this should instead be solved with a general solution which stops the reporting if some retry limit is reached. How would this be different from what we are doing now? > > Also the common case is that it does not crash repeatedly, and if it does, that is the scenario where I personally really would want the information, because something is seriously wrong. I agree, but my conclusion differs from yours: if a STEP does X, and X crashes repeatedly out, it makes sense at some point to stop doing X, since the probability that it will crash again is quite high. It is better to stop and proceed with the next STEPs, since their output may be more valuable than just another failed printout in the crashing STEP. > But maybe not at the cost of stack overflows, if it is a problem maybe some stack address limit can used to disable reentry in reentrant steps. > > > That brings me to the second problem, which is time. When we crash, we want to go down as fast as possible, e.g. allow a server to restart the node. OTOH we want a nice hs-err file. Therefore the time error handling is allowed to take is carefully limited. See `ErrorLogTimeout`: by default 2 Minutes, though our customers usually lower this to 30 seconds or even lower. > > Each STEP has a timeout, set to a fraction of that total limit (A quarter). A quarter gives us room for 2-3 hanging STEPS and still leaves enough breathing room for the remainder of the STEPS. > > If you now increase the number of STEPS, all these calculations are off. We may hit the recursive error limit much sooner, since every individual register printout may crash. And if they hang, they may eat up the ErrorLogTimeout much sooner. So we will get more torn hs-err files with "recursive limit reached, giving up" or "timeout reached, giving up". > > The timeout problem was something I thought about as well, and I think you are correct, and that we should treat the whole reentrant step as one timeout. (Same behaviour as before). > > > Note that one particularly fragile information is the printing of debug info, e.g. function name, etc. Since that relies on some parsing of debugging information. In our experience that can crash out or hang often, especially if the debug info has to be read from file or network. > > Alright, I see this as an argument for reentrant steps with one timeout for all iterations of the inner loop combined. > > I've heard opinions of something similar to reentrant steps in other parts of the hs_err printing. Like stack frame printing, where you can have iterative stages where each stage builds up more detailed information, until it crashes. And then prints what information it got so far. I remember, we had this discussion before, and I brought up the same arguments. I agree with @dean-long, it would be good to know what is going wrong first before changing the base mechanism. And harden the printing code instead, e.g. via SafeFetch/os::is_readable_pointer. Because SafeFetch, in contrast to this STEP mechanism, does not build up stack. Or, if applying SafeFetch is no reasonable option, retry the whole STEP with some of the riskier options disabled, like I did here: https://github.com/openjdk/jdk/pull/10179. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/10179 From yzhu at openjdk.org Wed Nov 9 06:39:29 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 9 Nov 2022 06:39:29 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <u6FZMjgVCDr-FObpIGU9Zi_ssE7voq9Wh-I9w5PrB14=.53258c93-5608-4a41-bf69-1ef4968c9e40@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation Marked as reviewed by yzhu (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11036 From yzhu at openjdk.org Wed Nov 9 06:39:31 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 9 Nov 2022 06:39:31 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> Message-ID: <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> On Wed, 9 Nov 2022 03:52:47 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: >> Gui Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use the same predicate as reduce_addI >> - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1717: > >> 1715: vredxor_vs(tmp, src2, tmp); >> 1716: break; >> 1717: case Op_MaxReductionV: > > We implemented Byte/Short MaxReductionV like this patch before. But there are 2 test cases that failed: > jdk/jdk/incubator/vector/ShortMaxVectorTests.java > jdk/jdk/incubator/vector/ByteMaxVectorTests.java > The reason is that "make_reduction_input" will return INT_MAX for Byte & Short type. > Did you meet this problem? > I see that this has been fixed in the previous PR 8271515. Approved. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From dzhang at openjdk.org Wed Nov 9 06:39:31 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 9 Nov 2022 06:39:31 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> Message-ID: <BuxZRUJMXmO_yyROFzmw6eONsIFwBoitkFBYmWdqjfo=.5753fa82-98f2-4a89-bdd9-795a75a94649@github.com> On Wed, 9 Nov 2022 06:32:40 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1717: >> >>> 1715: vredxor_vs(tmp, src2, tmp); >>> 1716: break; >>> 1717: case Op_MaxReductionV: >> >> We implemented Byte/Short MaxReductionV like this patch before. But there are 2 test cases that failed: >> jdk/jdk/incubator/vector/ShortMaxVectorTests.java >> jdk/jdk/incubator/vector/ByteMaxVectorTests.java >> The reason is that "make_reduction_input" will return INT_MAX for Byte & Short type. >> Did you meet this problem? > >> > > I see that this has been fixed in the previous PR 8271515. > Approved. Hi @yhzhu20, thanks for the review! I think this problem was solved in https://github.com/openjdk/jdk/pull/5873/commits/4e7f606030d67a4f284c375d9c8811f525ae97fd ? @@ -1105,7 +1145,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) case Op_MinReductionV: switch (bt) { case T_BYTE: + return gvn.makecon(TypeInt::make(max_jbyte)); case T_SHORT: + return gvn.makecon(TypeInt::make(max_jshort)); case T_INT: return gvn.makecon(TypeInt::MAX); case T_LONG: @@ -1120,7 +1162,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) case Op_MaxReductionV: switch (bt) { case T_BYTE: + return gvn.makecon(TypeInt::make(min_jbyte)); case T_SHORT: + return gvn.makecon(TypeInt::make(min_jshort)); case T_INT: return gvn.makecon(TypeInt::MIN); case T_LONG: The function "make_reduction_input" will return different types for Byte & Short than for int now. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From yzhu at openjdk.org Wed Nov 9 06:47:45 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 9 Nov 2022 06:47:45 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> Message-ID: <R1PCGqUPvTU_FlelVWCDuvtbTfX8qGsuKmQIDW4weEc=.f1020b58-cde5-4836-a66e-660e9e73e80c@github.com> On Wed, 9 Nov 2022 06:32:40 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1717: >> >>> 1715: vredxor_vs(tmp, src2, tmp); >>> 1716: break; >>> 1717: case Op_MaxReductionV: >> >> We implemented Byte/Short MaxReductionV like this patch before. But there are 2 test cases that failed: >> jdk/jdk/incubator/vector/ShortMaxVectorTests.java >> jdk/jdk/incubator/vector/ByteMaxVectorTests.java >> The reason is that "make_reduction_input" will return INT_MAX for Byte & Short type. >> Did you meet this problem? > >> > > I see that this has been fixed in the previous PR 8271515. > Approved. > Hi @yhzhu20, thanks for the review! I think this problem was solved in [4e7f606](https://github.com/openjdk/jdk/commit/4e7f606030d67a4f284c375d9c8811f525ae97fd) ? > > ```diff > @@ -1105,7 +1145,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) > case Op_MinReductionV: > switch (bt) { > case T_BYTE: > + return gvn.makecon(TypeInt::make(max_jbyte)); > case T_SHORT: > + return gvn.makecon(TypeInt::make(max_jshort)); > case T_INT: > return gvn.makecon(TypeInt::MAX); > case T_LONG: > @@ -1120,7 +1162,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) > case Op_MaxReductionV: > switch (bt) { > case T_BYTE: > + return gvn.makecon(TypeInt::make(min_jbyte)); > case T_SHORT: > + return gvn.makecon(TypeInt::make(min_jshort)); > case T_INT: > return gvn.makecon(TypeInt::MIN); > case T_LONG: > ``` > > The function "make_reduction_input" will return different types for Byte & Short than for int now. Understood. Thank you. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From dzhang at openjdk.org Wed Nov 9 06:47:45 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 9 Nov 2022 06:47:45 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <R1PCGqUPvTU_FlelVWCDuvtbTfX8qGsuKmQIDW4weEc=.f1020b58-cde5-4836-a66e-660e9e73e80c@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> <lN_2ZBx2Q8Vz-2wIsEDev2AfR-kqQuEVqIULdKmo0NE=.925a3ba6-941d-40ef-9536-8fc260ccbeb6@github.com> <gqAN5SMirJVoL-ibiOH0J-08HrKdgxwfHgFlquyvfaY=.0210dfc4-d020-4102-b2eb-f42cf0e1ff2f@github.com> <R1PCGqUPvTU_FlelVWCDuvtbTfX8qGsuKmQIDW4weEc=.f1020b58-cde5-4836-a66e-660e9e73e80c@github.com> Message-ID: <WxK5g1jJYU2FgyB084dQS_K4EfGsl3Bej9q6kBte5D8=.500405f3-7ef1-4fb2-acce-020a8cc89743@github.com> On Wed, 9 Nov 2022 06:42:43 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: >>> >> >> I see that this has been fixed in the previous PR 8271515. >> Approved. > >> Hi @yhzhu20, thanks for the review! I think this problem was solved in [4e7f606](https://github.com/openjdk/jdk/commit/4e7f606030d67a4f284c375d9c8811f525ae97fd) ? >> >> ```diff >> @@ -1105,7 +1145,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) >> case Op_MinReductionV: >> switch (bt) { >> case T_BYTE: >> + return gvn.makecon(TypeInt::make(max_jbyte)); >> case T_SHORT: >> + return gvn.makecon(TypeInt::make(max_jshort)); >> case T_INT: >> return gvn.makecon(TypeInt::MAX); >> case T_LONG: >> @@ -1120,7 +1162,9 @@ Node* ReductionNode::make_reduction_input(PhaseGVN& gvn, int opc, BasicType bt) >> case Op_MaxReductionV: >> switch (bt) { >> case T_BYTE: >> + return gvn.makecon(TypeInt::make(min_jbyte)); >> case T_SHORT: >> + return gvn.makecon(TypeInt::make(min_jshort)); >> case T_INT: >> return gvn.makecon(TypeInt::MIN); >> case T_LONG: >> ``` >> >> The function "make_reduction_input" will return different types for Byte & Short than for int now. > > Understood. Thank you. > > > > I see that this has been fixed in the previous PR 8271515. Approved. Sorry, I didn't refresh the page in time, so I saw your message too late and repeated the same message. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From thomas.stuefe at gmail.com Wed Nov 9 07:12:59 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 9 Nov 2022 08:12:59 +0100 Subject: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT) In-Reply-To: <c5db2105-3b78-0b77-779f-011c8649f476@oracle.com> References: <CAA-vtUzvyqb_LZ9d1Z80oSvzyqpi89053O3B+oDZF4oZ85CuZg@mail.gmail.com> <c5db2105-3b78-0b77-779f-011c8649f476@oracle.com> Message-ID: <CAA-vtUxrP49w523EQ3yKFh4Zo5W+HMtqpGcsUxOEMkwZUOQ0zQ@mail.gmail.com> Hi Alan, (replaced hotspot-runtime-dev with hotspot-dev, since its more of a general topic) thank you for your time! I am very happy to talk this through. I think native memory observability in the JDK (and customer code!) is sorely lacking. Witness the countless "where did my native memory go" blog articles. At SAP we have been struggling with this topic for a long time and have come up with a mixture of solutions. The aforementioned tracker was one, which extended our version of NMT across the JDK. Our SapMachine MallocTracer, which allows us to trace uninstrumented customer code, another. We even experimented with exchanging the allocator (using jemalloc) to gain insights. But that is a whole different topic with deep logistical implications, I don't want to touch it here. Exchanging the allocator does not help to observe virtual memory or the brk segment, of course. And to make the picture complete, another insight we currently lack is the implicit allocator overhead, which can be very significant and is hidden by the libc. We also have observability for that in the SapMachine, and I miss it in OpenJDK. As you noticed, my original intent was just to instrument Zlib and possibly improve tracking for DBBs. Although, thinking beyond that, another attractive instrumentation target would be mapped NIO buffers at least. So I think native memory observability is important. Arguably we could even extend observability to cover other OS resources, e.g. file handles. If we shift code around, to java/Panama: data that move the java heap does not need to be tracked, but other memory will always come from one of the basic system APIs, regardless of who allocates it and where in the stack allocation happens. Be it native JDK code, Panama, or even customer JNI code. If we agree on the importance of native memory observability, then I believe NMT is the right tool for it. It is a good tool. The machinery is already there. It covers both C-heap and virtual memory APIs, as well as thread stacks, and could easily be extended to cover sbrk if needed. And I assume that whatever shape OpenJDK takes on in the future, there always will be a libjvm.so at its core, so we will always have it. But even if not, NMT could be separated from libjvm.so quite easily, since it has no deep ties with the JVM. About coupling JVM with outside code: We don't have to directly link against libjvm.so. We can keep things loose if the intent is to be runnable without a JVM, or be JVM-version-agnostic. That could take the form of a function-pointer interface like JVMTI. Or outside code could dynamically dlsym the JVM allocation hooks. In any case gracefully falling back to system allocation routines when necessary. And I agree, polluting the NMT tag space with outside meaning is ugly. I only did it because I planned to go no further than instrumenting Zlib and possibly DBBs. But if we take this further, my preferred solution would be a reserved tag range or -ranges for outside use, whose inner meaning would be opaque to the JVM. Kind of like SIGRTMIN+SIGRTMAX. Then, outside code could register tags and their meta information with the JVM, or we find a different way to convey the tag meaning to NMT (config files, or callbacks). That could even be opened up for customer use. This also touches on another question, that of NMT tag space. NMT tags are very useful since they allow cheap tracking without capturing call stacks. However, tags are underused and show growing pains since they are too one-dimensional and restrictive. We had competing interests in the past about tag granularity. It is all over the place. We have coarse-grained tags like "mtThread", and very fine-grained ones like "mtObjectMonitor". There are several ways we could improve, e.g., by making them combinable like UL does, or allowing for a hierarchy of them - either a hard-wired limited one like "domain"+"tag", or an unlimited tree-like one. Technically interesting since whatever the new encoding is, they still must fit into a malloc header. I opened https://bugs.openjdk.org/browse/JDK-8281819 to track ideas like these. Instrumenting Panama allocations, including the ability to tag allocations, would be a very good idea. For instance, if we ever remove the native Zlib layer and convert it to java using Panama, we can do the same with Panama I do now natively - use the Zlib zalloc interface to hook in JVM memory allocation functions. The result could be completely identical, and the end user looking at the NMT output need never know that anything changed. And that goes for all instrumentation - if today we add it to JNI code, and that code gets removed tomorrow, we can add it to Panama code too. Unless data structures move to the heap, in which case there is no need to track them. You mentioned that NMT was more of an in-house support tool. Our experience is different. Even though it was positioned as a tool for JVM developers, and we never cared for the backward compatibility or consistency, it gets used a *lot* by our customers. We have to explain its output frequently. Also, many blog articles exist documenting its use. So, maybe it would be okay to elevate it to a user-facing tool since it seems to occupy that role anyway. We may also open up consumption of NMT results via java APIs, or expose its results via MXBeans. If this is to be a JEP, okay, but I'm afraid it would stall things a bit. I am interested in getting a simpler and quicker solution for older support releases at least, possibly based on my PR. I know that would be unconventional though. Thank you, Thomas On Sun, Nov 6, 2022 at 9:31 AM Alan Bateman <Alan.Bateman at oracle.com> wrote: > On 04/11/2022 16:54, Thomas St?fe wrote: > > Hi all, > > > > I am currently working on https://bugs.openjdk.org/browse/JDK-8296360; > > I was preparing the final PR [1], but then Alan did ask me to discuss > > this on core-libs first. > > > > Backstory: > > > > NMT tracks hotspot native allocations but does not cover the JDK > > libraries (small exception: Unsafe.AllocateMemory). However, the > > native memory footprint of JDK libraries can be significant. We have > > no in-VM tracker for these and need tools like valgrind or our > > SapMachine MallocTracer [2] to observe them. > > Thanks for starting a discussion on this as this is a topic that > requires agreement from several areas. If this is the start of something > bigger, where you want to have all allocation sites in the libraries > using NMT, then I think it needs a write-up, maybe a JEP. > > For starters, I think it needs some agreement on using NMT for memory > allocated outside of libjvm. You mentioned Unsafe as an exception but > that is implemented in the VM so you get tracking for free, albeit I > think all allocations are in the "mtOther" category. > > A general concern is that it creates more coupling between the VM code > and the libraries code. As you probably know, we've removed most of the > dependences on JVM_* functions from non-core areas over many years. So I > think that needs consideration as I assume we don't want > memory/allocation.hpp declaring a dozen catagories for allocations done > in say java.desktop module for example. Maybe your proposal will be > strictly limited to java.base but even then, do we really want the VM > even knowing about categories that are specific to zip compression or > decompression? > > There are probably longer term trends that should be part of the > discussion too. One general trend is that "run time" is becoming more > and more a hybrid of code in libvm and the Java libraries. Lambdas, > module system, virtual threads implementations are a few examples in the > last few release. This comes with many "Java on Java" challenges, > including serviceability where users of the platform will expect tools > to just work and won't care where the code is. NMT is probably more for > support teams and not something that most developers will ever use but I > think is part of the challenge of having serviceability solutions "just > work". > > In addition to having more of the Java runtime written in Java, there > will likely be less JNI code in the future. It's very possible that the > JNI code (including the JNI methods in libzip) will be replaced with > code that uses Panama memory and linker APIs once they are become > permanent. The effect of that would to have a lot of the memory > allocations be tracked in the mtOther category again. Maybe integration > with memory tracking should be looked at in conjunction with these APIs > and this migration. I could imagine the proposed "Arena" API > (MemorySession in Java 19) having some integration with NMT and it might > be interesting to look into that. > > So yes, this topic does need broader discussion and it might be a bit > premature to start with a PR for libzip without talking about the bigger > picture first. > > -Alan > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221109/4236b240/attachment-0001.htm> From thomas.stuefe at gmail.com Wed Nov 9 07:19:47 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 9 Nov 2022 08:19:47 +0100 Subject: RFR: JDK-8293402: hs-err file printer should reattempt stack trace printing if it fails In-Reply-To: <Rp3PfXUu7WB3uW3zDPIQBFtGPFACAy4uejtmjwXzzUY=.a9381947-12f8-40fd-9fe4-dfde0d8787a2@github.com> References: <KKGFd2UpKLmbHnMnMeYIfdUBNb4ely3fqlxIHO6U-VI=.a772ea05-f210-4f27-b091-010ff94f04aa@github.com> <Rp3PfXUu7WB3uW3zDPIQBFtGPFACAy4uejtmjwXzzUY=.a9381947-12f8-40fd-9fe4-dfde0d8787a2@github.com> Message-ID: <CAA-vtUz+gMNS-WsyKZ2VcK-=3h2bHiMTxyev6PWqFP3fWwWeQw@mail.gmail.com> Please ignore the last mail. Posted into the wrong PR. On Wed, Nov 9, 2022 at 6:13 AM Thomas Stuefe <stuefe at openjdk.org> wrote: > On Tue, 6 Sep 2022 08:05:23 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > > > Hi, > > > > may I have reviews for this small improvement. > > > > The call stack may be the most important part of an hs-err file. We > recently introduced printing of source information ( > https://bugs.openjdk.org/browse/JDK-8242181) which is nice but makes > stack printing more vulnerable for two reasons: > > - we may crash due to a programmer error (e.g. > https://bugs.openjdk.org/browse/JDK-8293344) > > - we may timeout on very slow machines/file systems when the source > information are parsed from the debug info (we have seen those problems in > the past) > > > > Therefore, VMError should retry stack printing without source > information if the first attempt to print failed. > > > > Examples: > > > > Step timeouts while retrieving source info: > > > > > > 24 --------------- T H R E A D --------------- > > 25 > > 26 Current thread (0x00007f70ac028bd0): JavaThread "main" > [_thread_in_vm, id=565259, stack(0x00007f70b0587000,0x00007f70b0688000)] > > 27 > > 28 Stack: [0x00007f70b0587000,0x00007f70b0688000], > sp=0x00007f70b0686cf0, free space=1023k > > 29 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code) > > 30 V [libjvm.so+0x1cd41c1] VMError::controlled_crash(int)+0x241 > > 31 [timeout occurred during error reporting in step "printing native > stack (with source info)"] after 30 s. > > 32 > > 33 Retrying call stack printing without source information... > > 34 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code) > > 35 V [libjvm.so+0x1cd41c1] VMError::controlled_crash(int)+0x241 > > 36 V [libjvm.so+0x11cbe45] JNI_CreateJavaVM+0x5b5 > > 37 C [libjli.so+0x4013] JavaMain+0x93 > > 38 C [libjli.so+0x800d] ThreadJavaMain+0xd > > 39 > > > > > > > > Step crashes while retrieving source info: > > > > > > 24 --------------- T H R E A D --------------- > > 25 > > 26 Current thread (0x00007fc000028bd0): JavaThread "main" > [_thread_in_vm, id=569254, stack(0x00007fc00573c000,0x00007fc00583d000)] > > 27 > > 28 Stack: [0x00007fc00573c000,0x00007fc00583d000], > sp=0x00007fc00583bcf0, free space=1023k > > 29 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code) > > 30 V [libjvm.so+0x1cd41e1] VMError::controlled_crash(int)+0x241 > > 31 [error occurred during error reporting (printing native stack (with > source info)), id 0xb, SIGSEGV (0xb) at pc=0x00007fc006694d78] > > 32 > > 33 > > 34 Retrying call stack printing without source information... > > 35 Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code) > > 36 V [libjvm.so+0x1cd41e1] VMError::controlled_crash(int)+0x241 > > 37 V [libjvm.so+0x11cbe65] JNI_CreateJavaVM+0x5b5 > > 38 C [libjli.so+0x4013] JavaMain+0x93 > > 39 C [libjli.so+0x800d] ThreadJavaMain+0xd > > > > > > > > Thanks, Thomas > > Hi Axel, > > > > Each time we crash out in VMError, we build up the stack. We never > ever unwind that stack, eg. via longjmp, since that would introduce other > errors (e.g. abandoned locks). There is a natural limit to how many > recursive crashes we can handle since the stack is not endless. Each > secondary crash increases the risk of running into guard pages and spoiling > the game for follow-up STEPs. > > > > I have no experience with stack depth being the problem in crashes (with > the caveat that I have only run with this patch for a few weeks), but have > experienced cases where only having a hs_err file available and having the > register print_location bailing out early, where missing printing the rest > has been unfavourable. > > I don't doubt your experience, but for us the picture is different. We > (SAP) have been working with this code for ~20yrs and hs-err files are one > of our main tool of support. That is why we contributed so much hardening > code, and why I am so interested. For us, torn hs-err files - by running > either out of time, stack space, or against the recursive limit - are a > frequent annoyance. Note that with stack space, you won't notice unless you > explicitly investigate - the hs-err file is just either abridged or not > there at all, but you won't get a clear message. > > I believe the problem you want to solve is real, but maybe we can find a > better solution than introducing a mass of new STEPs. Since by doing that > you effectively reduce the chance of later STEPs running successfully. > Stack and time are valuable resources all STEPs share, even if the > recursive limit is a bit arbitrary. And at least for stack space, I don't > see a solution - each crash will build up stack, and there is nothing to > prevent this. > > > > > > Therefore we limit the number of allowed recursive crashes. This limit > also serves a second purpose: if we crash that often, maybe we should just > stop already and let the process die. > > > > Fair, tough I am curious how we want to decide this limit, and why ~60 > is fine but ~90 would be too much (I am guessing that most steps have no, > or a very small possibility of crashing). > > No clue. This is quite old code from Sun times, I believe. > > >Maybe this should instead be solved with a general solution which stops > the reporting if some retry limit is reached. > > How would this be different from what we are doing now? > > > > > Also the common case is that it does not crash repeatedly, and if it > does, that is the scenario where I personally really would want the > information, because something is seriously wrong. > > I agree, but my conclusion differs from yours: if a STEP does X, and X > crashes repeatedly out, it makes sense at some point to stop doing X, since > the probability that it will crash again is quite high. It is better to > stop and proceed with the next STEPs, since their output may be more > valuable than just another failed printout in the crashing STEP. > > > But maybe not at the cost of stack overflows, if it is a problem maybe > some stack address limit can used to disable reentry in reentrant steps. > > > > > That brings me to the second problem, which is time. When we crash, we > want to go down as fast as possible, e.g. allow a server to restart the > node. OTOH we want a nice hs-err file. Therefore the time error handling is > allowed to take is carefully limited. See `ErrorLogTimeout`: by default 2 > Minutes, though our customers usually lower this to 30 seconds or even > lower. > > > Each STEP has a timeout, set to a fraction of that total limit (A > quarter). A quarter gives us room for 2-3 hanging STEPS and still leaves > enough breathing room for the remainder of the STEPS. > > > If you now increase the number of STEPS, all these calculations are > off. We may hit the recursive error limit much sooner, since every > individual register printout may crash. And if they hang, they may eat up > the ErrorLogTimeout much sooner. So we will get more torn hs-err files with > "recursive limit reached, giving up" or "timeout reached, giving up". > > > > The timeout problem was something I thought about as well, and I think > you are correct, and that we should treat the whole reentrant step as one > timeout. (Same behaviour as before). > > > > > Note that one particularly fragile information is the printing of > debug info, e.g. function name, etc. Since that relies on some parsing of > debugging information. In our experience that can crash out or hang often, > especially if the debug info has to be read from file or network. > > > > Alright, I see this as an argument for reentrant steps with one timeout > for all iterations of the inner loop combined. > > > > I've heard opinions of something similar to reentrant steps in other > parts of the hs_err printing. Like stack frame printing, where you can have > iterative stages where each stage builds up more detailed information, > until it crashes. And then prints what information it got so far. > > I remember, we had this discussion before, and I brought up the same > arguments. > > I agree with @dean-long, it would be good to know what is going wrong > first before changing the base mechanism. And harden the printing code > instead, e.g. via SafeFetch/os::is_readable_pointer. Because SafeFetch, in > contrast to this STEP mechanism, does not build up stack. Or, if applying > SafeFetch is no reasonable option, retry the whole STEP with some of the > riskier options disabled, like I did here: > https://github.com/openjdk/jdk/pull/10179. > > Cheers, Thomas > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10179 > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221109/081cb20f/attachment.htm> From fjiang at openjdk.org Wed Nov 9 07:20:29 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 9 Nov 2022 07:20:29 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> Message-ID: <bnXkpB5Np3lWKB-pzJIoqiogYS1nl-QLkCsIy1OtVUM=.8e2f56aa-f686-4273-8567-94e94d6bf44d@github.com> On Wed, 9 Nov 2022 03:08:57 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: > In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. > > The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. > > $ java -XX:+PrintBytecodeHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 8101142 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 634592 7.83% dc fast_aload_0 > 471840 5.82% b6 invokevirtual > 376275 4.64% 2b aload_1 > 358520 4.43% e0 fast_iload > 332267 4.10% de fast_aaccess_0 > 270189 3.34% a7 goto > 249831 3.08% 19 aload > 223361 2.76% b9 invokeinterface > 215666 2.66% 1c iload_2 > 194877 2.41% b8 invokestatic > 192212 2.37% 2c aload_2 > 185826 2.29% 1b iload_1 > > $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 7627721 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 102673 1.346% 84 a7 iinc goto > 85429 1.120% dc 2b fast_aload_0 aload_1 > 84394 1.106% dc b6 fast_aload_0 invokevirtual > 73131 0.959% b7 dc invokespecial fast_aload_0 > 64605 0.847% 2b b6 aload_1 invokevirtual > 64086 0.840% dc b9 fast_aload_0 invokeinterface > 63663 0.835% b6 dc invokevirtual fast_aload_0 > 59946 0.786% b6 de invokevirtual fast_aaccess_0 > 56631 0.742% 36 e0 istore fast_iload > 51261 0.672% b9 de invokeinterface fast_aaccess_0 > 49556 0.650% 3a 19 astore aload > 49106 0.644% a7 e0 goto fast_iload Marked as reviewed by fjiang (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11051 From stuefe at openjdk.org Wed Nov 9 07:27:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 9 Nov 2022 07:27:08 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Hi Axel, sorry for the delay, I believe I posted the answer into the wrong PR. > > Each time we crash out in VMError, we build up the stack. We never ever unwind that stack, eg. via longjmp, since that would introduce other errors (e.g. abandoned locks). There is a natural limit to how many recursive crashes we can handle since the stack is not endless. Each secondary crash increases the risk of running into guard pages and spoiling the game for follow-up STEPs. > > I have no experience with stack depth being the problem in crashes (with the caveat that I have only run with this patch for a few weeks), but have experienced cases where only having a hs_err file available and having the register print_location bailing out early, where missing printing the rest has been unfavourable. I don't doubt your experience, but for us the picture is different. We (SAP) have been working with this code for ~20yrs and hs-err files are one of our main tool of support. That is why we contributed so much hardening code, and why I am so interested. For us, torn hs-err files - by running either out of time, stack space, or against the recursive limit - are a frequent annoyance. Note that with stack space, you won't notice unless you explicitly investigate - the hs-err file is just either abridged or not there at all, but you won't get a clear message. I believe the problem you want to solve is real, but maybe we can find a better solution than introducing a load of new STEPs. Since by doing that you effectively reduce the chance of later STEPs running successfully. Stack and time are valuable resources all STEPs share, even if the recursive limit is a bit arbitrary. And at least for stack space, I don't see a solution - each crash will build up stack, and there is nothing to prevent this. > > > Therefore we limit the number of allowed recursive crashes. This limit also serves a second purpose: if we crash that often, maybe we should just stop already and let the process die. > > Fair, tough I am curious how we want to decide this limit, and why ~60 is fine but ~90 would be too much (I am guessing that most steps have no, or a very small possibility of crashing). No clue. This is quite old code from Sun times, I believe. >Maybe this should instead be solved with a general solution which stops the reporting if some retry limit is reached. How would this be different from what we are doing now? > > Also the common case is that it does not crash repeatedly, and if it does, that is the scenario where I personally really would want the information, because something is seriously wrong. I agree, but my conclusion differs from yours: if a STEP does X, and X crashes repeatedly out, it makes sense at some point to stop doing X, since the probability that it will crash again is quite high. It is better to stop and proceed with the next STEPs, since their output may be more valuable than just another failed printout in the crashing STEP. > But maybe not at the cost of stack overflows, if it is a problem maybe some stack address limit can used to disable reentry in reentrant steps. > > > That brings me to the second problem, which is time. When we crash, we want to go down as fast as possible, e.g. allow a server to restart the node. OTOH we want a nice hs-err file. Therefore the time error handling is allowed to take is carefully limited. See `ErrorLogTimeout`: by default 2 Minutes, though our customers usually lower this to 30 seconds or even lower. > > Each STEP has a timeout, set to a fraction of that total limit (A quarter). A quarter gives us room for 2-3 hanging STEPS and still leaves enough breathing room for the remainder of the STEPS. > > If you now increase the number of STEPS, all these calculations are off. We may hit the recursive error limit much sooner, since every individual register printout may crash. And if they hang, they may eat up the ErrorLogTimeout much sooner. So we will get more torn hs-err files with "recursive limit reached, giving up" or "timeout reached, giving up". > > The timeout problem was something I thought about as well, and I think you are correct, and that we should treat the whole reentrant step as one timeout. (Same behaviour as before). > > > Note that one particularly fragile information is the printing of debug info, e.g. function name, etc. Since that relies on some parsing of debugging information. In our experience that can crash out or hang often, especially if the debug info has to be read from file or network. > > Alright, I see this as an argument for reentrant steps with one timeout for all iterations of the inner loop combined. > > I've heard opinions of something similar to reentrant steps in other parts of the hs_err printing. Like stack frame printing, where you can have iterative stages where each stage builds up more detailed information, until it crashes. And then prints what information it got so far. I remember his discussion. > It would be nice to know why register printing crashes, and if there is something we could do to make it more robust, like checking os::is_readable_pointer(). I agree with @dean-long, it would be good to know what is going wrong first. And harden the printing code with os::is_readable_pointer. Because SafeFetch, in contrast to this STEP mechanism, does not build up stack. Or, if applying SafeFetch is no reasonable option, retry the whole STEP with risky options disabled, as I did here: https://github.com/openjdk/jdk/pull/10179. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Wed Nov 9 07:31:00 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Nov 2022 07:31:00 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> Message-ID: <KBGE74RyOSqDZmqcqH9m_yS4J9O8kfVDcVOFGObXYoY=.c60e3e6b-4d37-46d7-87fe-fad579a4c7ae@github.com> On Wed, 9 Nov 2022 07:23:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > It would be nice to know why register printing crashes, and if there is something we could do to make it more robust, like checking os::is_readable_pointer(). Personally I have only seen it inside gc code when working on generational ZGC and experimenting with some new features, especially when working on enhanced print_location printing where I break some invariants, (because we are already crashing) to sometimes print more detailed information based on potentially stale / racy data. I have only really seen it happen when I do something really stupid ? I know the this whole feature came up in discussions at the office where people expressed their annoyance at having register printing being interrupted half way through. So maybe someone else have a more concrete example of this occurring. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From duke at openjdk.org Wed Nov 9 07:56:20 2022 From: duke at openjdk.org (duke) Date: Wed, 9 Nov 2022 07:56:20 GMT Subject: Withdrawn: 8293416: ZGC: Set mark bit with unconditional atomic ops In-Reply-To: <fYHIgceX35S2340wfn0JEkQ4uEE7QTZxiCtafEIUpyw=.044eb14e-3326-4887-9575-a62272635d1f@github.com> References: <fYHIgceX35S2340wfn0JEkQ4uEE7QTZxiCtafEIUpyw=.044eb14e-3326-4887-9575-a62272635d1f@github.com> Message-ID: <dHxWJFeMXRhLeOkW_y2VSje2Kg49FGhv09U7ZQs_LVM=.38fa80f6-0989-434a-aba4-f516d6768b3f@github.com> On Tue, 6 Sep 2022 10:48:19 GMT, hev <duke at openjdk.org> wrote: > **Summary** > Support to set ZGC mark bit with unconditional atomic ops. > > **Motivation** > ZGC currently modify mark-bitmap by a conditional atomic operation (cmpxchg). This way is not optimal, which will retry the loop when cmpxchg fails. > > **Description** > First, This patch-set add an new unconditional atomic operation: Atomic::fetch_and_or, which is implemented in different ways for different CPU architectures: > > * Exclusive access: Non-nested loop > > > retry: > ll old_val, addr > or new_val, old_val, set_val > sc new_val, addr > beq retry > > > * Atomic access: One instruction > > > ldset old_val, set_val, addr > > > * Generic: Fallback to cmpxchg or use c++ __atomic_fetch_or > > **Testing** > * jtreg tests > * benchmark tests This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10182 From stefank at openjdk.org Wed Nov 9 08:01:33 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Nov 2022 08:01:33 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <KBGE74RyOSqDZmqcqH9m_yS4J9O8kfVDcVOFGObXYoY=.c60e3e6b-4d37-46d7-87fe-fad579a4c7ae@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> <KBGE74RyOSqDZmqcqH9m_yS4J9O8kfVDcVOFGObXYoY=.c60e3e6b-4d37-46d7-87fe-fad579a4c7ae@github.com> Message-ID: <3bI0YB3kTLMevcd_sxTYPsFTCFAYzSAetZVo-VLyO6g=.7545c055-8e3d-478d-a6ea-ff08feae226e@github.com> On Wed, 9 Nov 2022 07:27:34 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > > It would be nice to know why register printing crashes, and if there is something we could do to make it more robust, like checking os::is_readable_pointer(). > > Personally I have only seen it inside gc code when working on generational ZGC and experimenting with some new features, especially when working on enhanced print_location printing where I break some invariants, (because we are already crashing) to sometimes print more detailed information based on potentially stale / racy data. I have only really seen it happen when I do something really stupid ? > > I know the this whole feature came up in discussions at the office where people expressed their annoyance at having register printing being interrupted half way through. So maybe someone else have a more concrete example of this occurring. It's quite common to get broken oops in the registers when you crash in the GC. And when that happens the register printing code usually crash inside the oop/Klass printing code. We can probably try to figure out all places that can crash inside the printing code, but I think it will be significant undertaking, and probably significantly uglify the printing code. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From duke at openjdk.org Wed Nov 9 08:04:09 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Wed, 9 Nov 2022 08:04:09 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 Message-ID: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. This change replaces LEA: r1 = r1 + rsi * 1 + t with ADDs: r1 += t; r1 += rsi. Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. ------------- Commit messages: - 8296548: Improve MD5 intrinsic for x86_64 Changes: https://git.openjdk.org/jdk/pull/11054/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296548 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054 PR: https://git.openjdk.org/jdk/pull/11054 From aboldtch at openjdk.org Wed Nov 9 08:26:08 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Nov 2022 08:26:08 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <s0chXUFW6U2vWew4oUi6zLlrcj_a6Uzql2vNleYPKHA=.0f13741a-7b40-4116-9327-e67e8c19e4c2@github.com> Message-ID: <jLw3Cejr3PpV_aj00xcys-EcOt8G3FNuAVPpqnr1lLU=.21a5611f-27f6-48ad-9081-3c14f4b668c7@github.com> On Wed, 9 Nov 2022 07:23:42 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > How would this be different from what we are doing now? I meant with regards to one specific reentrant step. Such that it could break out of the iteration early after X crashes, separate from any VMError::report limit. > I agree, but my conclusion differs from yours: if a STEP does X, and X crashes repeatedly out, it makes sense at some point to stop doing X, since the probability that it will crash again is quite high. It is better to stop and proceed with the next STEPs, since their output may be more valuable than just another failed printout in the crashing STEP. If every print crashes then something seems very wrong, I was more thinking that we get a few crashes in a specific subsystem, where getting a few repeated crashes on some specific address ranges help narrow down the problem. One crash might be unrelated, but two or three in a specific subsystem shows a trend and helps narrow down the root cause. > I agree with @dean-long, it would be good to know what is going wrong first. And harden the printing code with os::is_readable_pointer. Because SafeFetch, in contrast to this STEP mechanism, does not build up stack. Or, if applying SafeFetch is no reasonable option, retry the whole STEP with risky options disabled, as I did here: #10179. I agree, the norm should be that print location does not crash. As I mentioned in my reply above I have only seen it when I introduce it myself. If this feature is something that is not within the requirements of hs_err printing. Maybe it could be enabled with some diagnostic flag, as a development/debugging tool for those specific scenarios when crashes have the behaviour of making print location unstable. But I have for to little experience and knowledge about the requirements of VMError::report so I will leave the discussion to more veteran people. I do however think it would be a nice feature to have for the times where it actually matters. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From dlong at openjdk.org Wed Nov 9 08:33:21 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Nov 2022 08:33:21 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <W-d91Gr-HVZA6VsryLt1dLddb_qAkx-3t596Jcge74M=.e19d404f-fe67-4d27-9731-6a6d4811e182@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) I hit a register value that crashed recently. It wasn't a bad oop, but it was a derived oop. The value was an interior pointer to an array element. In general derived oops can be an arbitrary delta from the base oop, so identifying them would normally require looking at the oopmap, but it seems like interior pointers should be easier to handle. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From rehn at openjdk.org Wed Nov 9 08:51:29 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Nov 2022 08:51:29 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <PtavhgLXXDsdSNOv8U0AOeDSOI0XeW007evoxkf2S70=.213aae48-1391-4feb-87ba-d10793513a1e@github.com> On Mon, 7 Nov 2022 14:34:45 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Looks good (-+ few nits), the errno capture feature seems handy, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11019 From alanb at openjdk.org Wed Nov 9 09:02:18 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 9 Nov 2022 09:02:18 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <J0bLxJBGH9VGh3qQDjg-GybaEo41W5rV_BYnJplr3G4=.d4a87ab5-0a4e-4785-b9b3-51960c06c969@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> <VD0zW_j55V_Tyk9YFRxJfiMR-a3HnChbPJzt7iE-cvo=.0c42b9a6-beda-405b-9ed5-af7f8c3af6d1@github.com> <J0bLxJBGH9VGh3qQDjg-GybaEo41W5rV_BYnJplr3G4=.d4a87ab5-0a4e-4785-b9b3-51960c06c969@github.com> Message-ID: <X2yeM1-JUMOJDkW0F0jCwdc6mkIKZQhiasNGPxcwPuY=.bfc7c0a8-db5f-4566-9177-66ed9ace692b@github.com> On Tue, 8 Nov 2022 23:45:18 GMT, David Holmes <dholmes at openjdk.org> wrote: > "in the custom class loader must append to the class path in a thread-safe manner." Maybe "its search path" rather than "the class path" because the custom class loader can't add to the class path (the application class path continues to be owned by the application class loader). ------------- PR: https://git.openjdk.org/jdk/pull/11023 From alanb at openjdk.org Wed Nov 9 09:20:35 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 9 Nov 2022 09:20:35 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v14] In-Reply-To: <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> Message-ID: <l01fbF6hvDdtz__56Djm93FkoWC1l8XKv5Fyt_DaIiY=.346f3d23-9301-403c-8397-daf7e3c57622@github.com> On Tue, 8 Nov 2022 22:12:46 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo src/java.base/share/classes/java/lang/foreign/Arena.java line 131: > 129: * @param thread the thread to be tested. > 130: */ > 131: boolean isOwnedBy(Thread thread); A shared Arena can be closed by any thread. Should a shared Arena be considered as being owned by all threads so that this method always returns true for a non-null thread? In the old API, a shared memory session has no owner so it was a bit clearer. I think my comment is mostly about the method name being about ownership, whereas the javadoc is about who can close. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From sspitsyn at openjdk.org Wed Nov 9 09:23:17 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Nov 2022 09:23:17 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> Message-ID: <WzN2VJ3a7ykSEbd4wO7UpXsrpFpen5yCPj591phW1q8=.efef7e44-daab-4ace-8850-e68b8d87c531@github.com> On Tue, 8 Nov 2022 14:55:17 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Forgot a null check. src/hotspot/share/prims/jvmtiEnvBase.cpp line 564: > 562: > 563: for (int i=0; i<length; i++) { > 564: objArray[i] = JNIHandles::make_local(groups->obj_at(i)); Nit: Spaces are missed around '=' and '<' signs. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From fyang at openjdk.org Wed Nov 9 09:34:08 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 9 Nov 2022 09:34:08 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <O-VlIkjjhznZ4NTbeOXjSob-EW1oBLkPxxHc9E5bBvM=.eb8e073c-0add-4012-9be3-30f198c7b618@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation Nice refactoring work. Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11036 From gcao at openjdk.org Wed Nov 9 09:34:09 2022 From: gcao at openjdk.org (Gui Cao) Date: Wed, 9 Nov 2022 09:34:09 GMT Subject: RFR: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation [v2] In-Reply-To: <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> <OuB5KrAXxenm7tdaawbOYXY4R3DOdtb0TxUvMb9iBVg=.a6b14d8c-aeb0-47c6-8a9d-bc5e82f25588@github.com> Message-ID: <e1AuRzrxQqBzsEexeNpMZwk8yo8dMnWmIYfHVeeIbBE=.2e881de1-7585-4b13-a715-3a25b0fa6714@github.com> On Tue, 8 Nov 2022 11:41:22 GMT, Gui Cao <gcao at openjdk.org> wrote: >> HI, >> >> The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - hotspot and jdk tier1 on unmatched board without new failures >> - test/jdk/jdk/incubator/vector/* with fastdebug on qemu > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use the same predicate as reduce_addI > - Remove the REDUCTION_OP enumeration type and use Opcode to represent the operation Thanks for review. And all of `test/jdk/jdk/incubator/vector/*` test case passed. ------------- PR: https://git.openjdk.org/jdk/pull/11036 From stuefe at openjdk.org Wed Nov 9 09:34:34 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 9 Nov 2022 09:34:34 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <gOrgh1vItxNhOuZ7oE0grTTIAPDT3i_CXu-jQJ421RE=.119357af-bde5-44ce-865f-8f6adbd4b947@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Maybe a pragmatic combination of approaches would work, e.g. allow a limited number of recursive attempts for register printing, combined with looking at the printer. I would hope that most cases trip over a small number of issues and that we can solve them without wrapping every load with SafeFetch. That would admittedly be very ugly. I think we do this in places already, e.g. checking Klass* for validity (needs to be in committed class space, needs to be aligned, needs to point to mapped memory yada yada) ------------- PR: https://git.openjdk.org/jdk/pull/11017 From sspitsyn at openjdk.org Wed Nov 9 09:38:41 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Nov 2022 09:38:41 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> Message-ID: <8IJbhIid1KAN_8xogA0fSvh0RIbYhzPBB5sMBVgQkBw=.ed314adf-1e6e-4ba8-a263-ede8118d4107@github.com> On Tue, 8 Nov 2022 14:55:17 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Forgot a null check. src/hotspot/share/prims/jvmtiEnvBase.cpp line 557: > 555: JvmtiEnvBase::new_jthreadGroupArray(int length, objArrayHandle groups) { > 556: if (length == 0) { > 557: return NULL; I do not think returning NULL is allowed for JVMTI `GetThreadGroupChildren()`. Please, see: [GetThreadGroupChildren](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html#GetThreadGroupChildren) ------------- PR: https://git.openjdk.org/jdk/pull/11033 From gcao at openjdk.org Wed Nov 9 09:42:23 2022 From: gcao at openjdk.org (Gui Cao) Date: Wed, 9 Nov 2022 09:42:23 GMT Subject: Integrated: 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation In-Reply-To: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> References: <X-Uto309S8g8v1PxRfkG2AY1kaYR25vJTCWUZxJnYGs=.afc4c41c-08c7-4f4f-8286-117b325e428a@github.com> Message-ID: <b6lmb00i3-jtzG5hNtE7dquEeDEz0R_GUimxtaQYygY=.86547137-5eb5-49c9-a856-22582407a4b2@github.com> On Tue, 8 Nov 2022 09:16:22 GMT, Gui Cao <gcao at openjdk.org> wrote: > HI, > > The MaxReductionV, MinReductionV, AddReductionV nodes currently implemented by riscv rvv can be implemented by calling shared functions, and the T_BYTE and T_SHORT types in the MaxReductionV and MinReductionV node implementations can also be implemented in the same way as the T_INT type. > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - hotspot and jdk tier1 on unmatched board without new failures > - test/jdk/jdk/incubator/vector/* with fastdebug on qemu This pull request has now been integrated. Changeset: fef68bba Author: Gui Cao <gcao at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/fef68bbaf6de7e0d4be311a5f3648c16548c5b4d Stats: 190 lines in 4 files changed: 21 ins; 117 del; 52 mod 8296515: RISC-V: Small refactoring for MaxReductionV/MinReductionV/AddReductionV node implementation Reviewed-by: luhenry, dzhang, yzhu, fyang ------------- PR: https://git.openjdk.org/jdk/pull/11036 From alanb at openjdk.org Wed Nov 9 09:51:37 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 9 Nov 2022 09:51:37 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <8IJbhIid1KAN_8xogA0fSvh0RIbYhzPBB5sMBVgQkBw=.ed314adf-1e6e-4ba8-a263-ede8118d4107@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> <8IJbhIid1KAN_8xogA0fSvh0RIbYhzPBB5sMBVgQkBw=.ed314adf-1e6e-4ba8-a263-ede8118d4107@github.com> Message-ID: <A3Qkq7t-qMy-88e_UoN0ka3WS5W2Eu0T9gVJ_-KtcY4=.248f7d7b-7a90-4ef1-8f9a-2f07296e5afc@github.com> On Wed, 9 Nov 2022 09:32:42 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Forgot a null check. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 557: > >> 555: JvmtiEnvBase::new_jthreadGroupArray(int length, objArrayHandle groups) { >> 556: if (length == 0) { >> 557: return NULL; > > I do not think returning NULL is allowed for JVMTI `GetThreadGroupChildren()`. > Please, see: [GetThreadGroupChildren](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html#GetThreadGroupChildren) I don't think this has changed. Right now, if there are no child subgroups then *group_count_ptr will be 0 and *groups_ptr will be NULL as there is no memory to deallocate. JVMTI Deallocate is specified to do nothing when called with NULL. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From xlinzheng at openjdk.org Wed Nov 9 09:53:25 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 9 Nov 2022 09:53:25 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <ex_4VH3mqzj5oOy-DBjy2GaKZmtCoRq3h5xHmQ0Fe1c=.e49b3098-5205-4f5f-9c3d-e4ad7cc8375d@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> <ex_4VH3mqzj5oOy-DBjy2GaKZmtCoRq3h5xHmQ0Fe1c=.e49b3098-5205-4f5f-9c3d-e4ad7cc8375d@github.com> Message-ID: <H7istqpte0xCAjyZiKw-swDT7jGTybU6bUgmNce1i6c=.7d96a930-4eb8-4dc9-bdec-845491e0582b@github.com> On Tue, 8 Nov 2022 14:40:39 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? Thank you. I re-examined this patch, and they seem okay to me again. For the t2 usage in this patch, we may come to the UEP via the `call_stub` when leaving the C++ world, where gcc would help us save alive caller-saved registers. So we can use t2 here. Or, we come to the UEP from the Java world: t2 is SOC so it would have been saved before performing this call if alive. Some comments in itable stub[1] can also help to make it clear, and in vtable stub[2] it seems we also use a t2 as tmp. Would this be okay for you :-)? [1] https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L177-L178 [2] https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L84 ------------- PR: https://git.openjdk.org/jdk/pull/11010 From rehn at openjdk.org Wed Nov 9 10:07:36 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Nov 2022 10:07:36 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <yIaaqK4Eqwty8bn3dSjhEJXoKA3SNXM2pOO6NovgSho=.ccd162ae-5a19-4e17-ab3a-a05114a9a0ed@github.com> On Tue, 8 Nov 2022 16:19:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. I think David have a point here, using e.g. non-gen ZGC returning from Fibonacci could be slowed-down. I assume gen-ZGC will only need the CMF once per safepoint? If so can we make it conditional per safepoint generation? Or make sure it is not a problem. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From eosterlund at openjdk.org Wed Nov 9 10:17:04 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 9 Nov 2022 10:17:04 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> Message-ID: <rAyPuMDllxJwFOd4d-a9Cbo_MjkmqdirKbWXFaLfZKs=.8f33ca2d-1877-4667-a45c-ab72d426cc3d@github.com> On Wed, 9 Nov 2022 01:53:48 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. > > src/hotspot/share/runtime/safepointMechanism.cpp line 144: > >> 142: >> 143: update_poll_values(thread); >> 144: OrderAccess::cross_modify_fence(); > > Has this simply been moved to cover more paths? The new logic inside of update_poll_values makes is more accurate and made this cross_modify_fence redundant. That's why I removed it. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From eosterlund at openjdk.org Wed Nov 9 10:45:30 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 9 Nov 2022 10:45:30 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> Message-ID: <uUYptwQe55wV_M5isGTs_xMUKFJ6lJUufFoUmJJ5y44=.ced3b2cd-021a-43ee-817a-80354b09c639@github.com> On Wed, 9 Nov 2022 01:55:02 GMT, David Holmes <dholmes at openjdk.org> wrote: > What are the performance implications here, if any? I don't like that everyone pays for this when it is only needed for genZGC. > > Thanks. When you don't use the concurrent stack processing code (ie. G1, Serial, Parallel), then the arm value will be disarmed or armed, and we will run the cross_modify_fence once per handshake/safepoint only, when the thread disarms itself. We already did that before, but where update_poll_values is called instead. So in that regard, I suppose there shouldn't be any difference unless concurrent stack processing is being used. And if it is, I measured the effect to be insignificant in aurora, so I didn't think it seemed worth it to introduce some new shared code abstraction to specify in a more fine grained way when exactly you do or do not want this fence. I expect that soon all GCs that use concurrent stack processing also want to do this fence anyway, as you probably still want to have immediate oops embedded in nmethods because that does have a performance edge, which on some platforms does require this fence. We just haven't supported that yet. Hope this explanation makes sense > src/hotspot/share/runtime/safepointMechanism.cpp line 108: > >> 106: if (prev_poll_word != poll_word || >> 107: prev_poll_word == _poll_word_armed_value) { >> 108: // After updating the poll value, we allow entering new nmethods > > I'm a little confused about the positioning here. The comment says "after updating the poll value", but we haven't updated yet (happens below) so don't we need the fence after that point? The important thing is that it is called after the handshake/safepoint/stack watermark operation. The wording is just unfortunate. Maybe I should change it to "While updating the poll value" to be less confusing? Does that read better? ------------- PR: https://git.openjdk.org/jdk/pull/11042 From eosterlund at openjdk.org Wed Nov 9 10:45:31 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 9 Nov 2022 10:45:31 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <rAyPuMDllxJwFOd4d-a9Cbo_MjkmqdirKbWXFaLfZKs=.8f33ca2d-1877-4667-a45c-ab72d426cc3d@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> <rAyPuMDllxJwFOd4d-a9Cbo_MjkmqdirKbWXFaLfZKs=.8f33ca2d-1877-4667-a45c-ab72d426cc3d@github.com> Message-ID: <qGmHOeXEV4B1b1nNVm-uilJvmyoOA5BJsybw418puOQ=.7c2e0cb2-7dc0-4b60-8009-1ef4345c3c9c@github.com> On Wed, 9 Nov 2022 10:13:55 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> src/hotspot/share/runtime/safepointMechanism.cpp line 144: >> >>> 142: >>> 143: update_poll_values(thread); >>> 144: OrderAccess::cross_modify_fence(); >> >> Has this simply been moved to cover more paths? > > The new logic inside of update_poll_values makes is more accurate and made this cross_modify_fence redundant. That's why I removed it. > Has this simply been moved to cover more paths? I removed it because it is redundant after updating update_poll_values to more precisely identify when cross_modify_fence should be called. It will then already have been called inside of update_poll_values, when it was needed here at the call site. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From mcimadamore at openjdk.org Wed Nov 9 11:00:33 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 9 Nov 2022 11:00:33 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v14] In-Reply-To: <l01fbF6hvDdtz__56Djm93FkoWC1l8XKv5Fyt_DaIiY=.346f3d23-9301-403c-8397-daf7e3c57622@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> <l01fbF6hvDdtz__56Djm93FkoWC1l8XKv5Fyt_DaIiY=.346f3d23-9301-403c-8397-daf7e3c57622@github.com> Message-ID: <ZfUc1DFmpM2mmWQBHOHf7AmZaO3wAnEkIvqdNBpjSkI=.3d59e62e-57f5-4d2b-9e50-49d7e06841a1@github.com> On Wed, 9 Nov 2022 09:16:49 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo > > src/java.base/share/classes/java/lang/foreign/Arena.java line 131: > >> 129: * @param thread the thread to be tested. >> 130: */ >> 131: boolean isOwnedBy(Thread thread); > > A shared Arena can be closed by any thread. Should a shared Arena be considered as being owned by all threads so that this method always returns true for a non-null thread? In the old API, a shared memory session has no owner so it was a bit clearer. I think my comment is mostly about the method name being about ownership, whereas the javadoc is about who can close. Very good point - all threads are owners. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Wed Nov 9 11:42:47 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 9 Nov 2022 11:42:47 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v15] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <NH0djbSdM731jDLBOtWZhzbMBZnJP4rLuAA3bqdo9eg=.d5e650a4-2e07-42d6-9361-147f2b578888@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Rename isOwnedBy -> isCloseableBy Fix minor typos Fix StrLenTest/RingAllocator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/df29e6a0..2d75f954 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=13-14 Stats: 11 lines in 3 files changed: 2 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From sspitsyn at openjdk.org Wed Nov 9 11:58:32 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Nov 2022 11:58:32 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <A3Qkq7t-qMy-88e_UoN0ka3WS5W2Eu0T9gVJ_-KtcY4=.248f7d7b-7a90-4ef1-8f9a-2f07296e5afc@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> <8IJbhIid1KAN_8xogA0fSvh0RIbYhzPBB5sMBVgQkBw=.ed314adf-1e6e-4ba8-a263-ede8118d4107@github.com> <A3Qkq7t-qMy-88e_UoN0ka3WS5W2Eu0T9gVJ_-KtcY4=.248f7d7b-7a90-4ef1-8f9a-2f07296e5afc@github.com> Message-ID: <ivQj42oRzyo1GO_8_6nBpuld9wMZP0lN4oHpnURLO3Y=.efd44aa2-4d2d-4964-9541-b4998376dbef@github.com> On Wed, 9 Nov 2022 09:49:10 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 557: >> >>> 555: JvmtiEnvBase::new_jthreadGroupArray(int length, objArrayHandle groups) { >>> 556: if (length == 0) { >>> 557: return NULL; >> >> I do not think returning NULL is allowed for JVMTI `GetThreadGroupChildren()`. >> Please, see: [GetThreadGroupChildren](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html#GetThreadGroupChildren) > > I don't think this has changed. Right now, if there are no child subgroups then *group_count_ptr will be 0 and *groups_ptr will be NULL as there is no memory to deallocate. JVMTI Deallocate is specified to do nothing when called with NULL. Alan, you are right. This check existed before. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From tschatzl at openjdk.org Wed Nov 9 12:04:37 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Nov 2022 12:04:37 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> Message-ID: <t2zrTAXW_T-i8Q5MmM-n5Yoplvb7Yc4z12LmhtVUYEc=.52d99ea4-6d5e-4657-9fa8-e6dbca7415d0@github.com> On Thu, 3 Nov 2022 16:06:47 GMT, Ashutosh Mehra <duke at openjdk.org> wrote: > This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. > > In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. > When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. > When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. > > This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. > This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. Some additional initial notes about the code: * please document the new APIs in `CollectedHeap` (something like "// Support for mapping archive regions into the heap" for 6 methods with tons of parameters is not sufficient) - it would also help reviewing. * also we should be careful with the naming as the term "region" is overloaded enough already; e.g. `heap_region_dealloc_supported` seems to miss an `archive`. * while we are at changing the API, the change should use appropriate types, i.e. unsigned ints/size_t for sizes. * there is quite a bit inconsistency in the naming: sometimes the MemRegions are called "something_regions", sometimes "something_ranges", there is also "something_spaces"; the same with the associated counts that are sometimes called "count", and sometimes "num_regions". Summarizing all this, I would strongly suggest to use the term "ranges" instead of regions for the areas/MemRegions from the archive. This would help a lot with code clarity in collectors that already use the term regions. Similar to @iklam I strongly suggest splitting this change by collector (or at least basic+g1 and epsilon/serial/parallel) anyway. There will likely be considerable changes on top of your current stack of changes which may require working/reviewing on the tip. There are additional initial comments in the PR. I understand that some of them may become obsolete as we change the API. I did not really look at dumptime considerations which seem to be under discussion right now; I do not have the overview about the exact requirements/problems in that area, particularly with regards to JDK-8296344. The chosen API (and the intended execution flow during dump/runtime) is underdocumented at the moment to give good comments. It would be nice to summarize that to start a discussion; maybe even discuss this on some mailing list instead in the PR of a 2k LOC change. However, from the existing commetns here, I do think it is a good idea to write out all chosen assumptions when dumping (e.g. the alignment boundary of 1MB), even if currently that is the minimum for all collectors. src/hotspot/share/cds/archiveHeapLoader.cpp line 100: > 98: heap_regions->runtime_regions(), > 99: is_open)) { > 100: heap_regions->set_state(ArchiveHeapRegions::HEAP_RESERVED); Indentation src/hotspot/share/cds/archiveUtils.cpp line 153: > 151: _dumptime_regions = MemRegion::create_array(max_region_count, mtInternal); > 152: _runtime_regions = MemRegion::create_array(max_region_count, mtInternal); > 153: _region_idx = (int *)os::malloc(sizeof(int) * max_region_count, mtInternal); Use `NEW_C_HEAP_ARRAY/FREE_C_HEAP_ARRAY` instead of directly using `os` methods. src/hotspot/share/cds/filemap.cpp line 2121: > 2119: UseCompressedOops ? p2i(CompressedOops::end()) : > 2120: UseG1GC ? p2i((address)G1CollectedHeap::heap()->reserved().end()) : 0L); > 2121: #endif Should be dropped. src/hotspot/share/gc/epsilon/epsilonHeap.cpp line 157: > 155: > 156: assert(alignment == 0 || is_aligned(res, alignment), "Allocated space " PTR_FORMAT > 157: " does not have requested alignment of " SIZE_FORMAT " bytes", p2i(res), alignment); Indentation src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 541: > 539: } > 540: > 541: bool G1CollectedHeap::alloc_archive_regions(MemRegion* dumptime_regions, int num_regions, MemRegion* runtime_regions, bool is_open) { I'm not completely clear why we need both dumptime and runtime ranges here. I can see that dumptime range size is used for getting sizes, and runtime ranges are then updated to reflect the actual mapping. Please do not intermix input/output parameters in the parameter list unless absolutely necessary. Additionally, documentation in `CollectedHeap` would have saved me an hour or so trying to understand the code. I.e. the signature of `alloc_archive_regions` should be: bool G1CollectedHeap::alloc_archive_regions(MemRegion* dumptime_regions, int num_regions, bool is_open, MemRegion* runtime_regions) { src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 555: > 553: "is not aligned to OS default page size", region_size); > 554: total_size += region_size; > 555: } Factor this out into a helper method somewhere; this code seems to be duplicated in all implementations too. Also some effort should be expended to factor out common code between collectors. The code for Epsilon/Serial/Parallel looks at least from the outside apart from minor differences like copy&paste. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 559: > 557: HeapRegion* hr = NULL; > 558: HeapWord* mem_end = NULL; > 559: HeapWord* mem_begin = NULL; Please use `nullptr` in new code everywhere instead of `NULL`. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 577: > 575: } > 576: return false; > 577: } This big loop could be moved out of this huge method by having `alloc_highest_free_region` take a number of contiguous regions to allocate; this would also remove the need for the bailout uncommits as `alloc_highest_free_region` at the lowest level can simply search for a contiguous range of bits instead of a single bit, and reserve the whole range at once. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 593: > 591: mem_begin = hr->bottom(); > 592: > 593: HeapRegion* prev_range_last_region = NULL; This block of code should have a high-level comment about what it does: * filling in runtime region information (start/end) * updating G1 region information (tops)/adding to archive set/printing allocation Even better, move this loop into a well-named (static?) helper method; this method is just way too long. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 603: > 601: curr_range->set_start(align_up(runtime_regions[i-1].end(), alignment)); > 602: } > 603: assert(is_aligned(curr_range->start(), alignment), "region does not start at OS default page size"); s/region/MemRegion/ or "range" src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 608: > 606: // Add each G1 region touched by the range to the old set, and set the top. > 607: HeapRegion* curr_region = _hrm.addr_to_region(curr_range->start()); > 608: HeapWord* last_address = curr_range->last(); s/old set/archive set/ src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 614: > 612: if (curr_region != prev_range_last_region) { > 613: _hr_printer.alloc(curr_region); > 614: _archive_set.add(curr_region); I think both statements should be moved where the allocation and setting the region type happens. Put all these updates to the region into a (static?) helper method. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 631: > 629: } > 630: > 631: bool G1CollectedHeap::check_archive_addresses(MemRegion* ranges, size_t count) { Please be consistent with parameters, i.e. ranges vs. regions, count vs. num_whatever src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 672: > 670: } > 671: > 672: void G1CollectedHeap::populate_archive_regions_bot_part(MemRegion* ranges, size_t count) { These two methods should be merged into one. `complete_archive_regions_alloc` is the only caller of `populate_archive_regions_bot_part` apparently. Again please check parameter names for consistency. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 696: > 694: assert(!is_init_completed(), "Expect to be called at JVM init time"); > 695: assert(ranges != NULL, "MemRegion array NULL"); > 696: assert(num_regions != 0, "No MemRegions provided"); The method could probably just find out all regions that are covered by the given ranges the same way as `alloc_archive_regions` did (i.e. summing up byte-sizes; btw, parameters contain both "ranges" and "regions") and uncommit them en-bloc. This seems to go range by range and find out which regions they correspond to and uncommit them one by one. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 706: > 704: // notify mark-sweep that the range is no longer to be considered 'archive.' > 705: MutexLocker x(Heap_lock); > 706: for (int i = 0; i < num_regions; i++) { Don't use ints for counts. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 816: > 814: } > 815: > 816: // Cannot use verbose=true because Metaspace is not initialized That comment is confusing. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 828: > 826: curr_range->set_start(result); > 827: } else { > 828: // next range should be aligned to page size Sentences in comments should start with upper case and end with a full stop (I'm only commenting this once here). ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10970 From alanb at openjdk.org Wed Nov 9 12:05:35 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 9 Nov 2022 12:05:35 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v14] In-Reply-To: <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <lEpO2Kc_JsL9bhFgf_zuibyNPm-O-zzgL9oJ0kxaDQY=.2027960f-2d7a-4ed6-b844-402a7bb478a2@github.com> Message-ID: <s2MCZ3s24-cmdanKefZCacjxsI0k_1boT5dqXEb-q6k=.e91b667a-58c8-46ab-9dcf-50cfc8ca7f64@github.com> On Tue, 8 Nov 2022 22:12:46 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo src/java.base/share/classes/java/lang/foreign/Arena.java line 125: > 123: */ > 124: @Override > 125: void close(); I'm trying to understand how close interacts with whileAlive on its memory session. Does close throw or block when there is a critical action running? The javadoc doesn't say right now. src/java.base/share/classes/java/lang/foreign/MemorySession.java line 43: > 41: * <p> > 42: * Conversely, a bounded memory session has a start and an end. Bounded memory sessions can be managed either > 43: * explicitly, (i.e. using an {@linkplain Arena arena}) or implicitly, by the garbage collector. When a bounded memory A minor style thing here is that this should probably be "using an {@link Arena}" as you really mean the Arena. This helps a bit with the generate docs as it shows up currently as "arenaPREVIEW", if you see what I mean. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From yadongwang at openjdk.org Wed Nov 9 12:13:26 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Wed, 9 Nov 2022 12:13:26 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <H7istqpte0xCAjyZiKw-swDT7jGTybU6bUgmNce1i6c=.7d96a930-4eb8-4dc9-bdec-845491e0582b@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> <ex_4VH3mqzj5oOy-DBjy2GaKZmtCoRq3h5xHmQ0Fe1c=.e49b3098-5205-4f5f-9c3d-e4ad7cc8375d@github.com> <H7istqpte0xCAjyZiKw-swDT7jGTybU6bUgmNce1i6c=.7d96a930-4eb8-4dc9-bdec-845491e0582b@github.com> Message-ID: <LM59vc8vUxShy162h3UUe5_Gs2VOyoOx4OY0YVwur-Y=.1260560b-1172-46f9-9c95-3b9e657989c6@github.com> On Wed, 9 Nov 2022 09:51:11 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? > >> @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? > > Thank you. > I re-examined this patch, and they seem okay to me again. > For the t2 usage in this patch, we may come to the UEP via the `call_stub` when leaving the C++ world, where gcc would help us save alive caller-saved registers, so we can use t2 here. Or, we come to the UEP from the Java world: t2 is SOC so it would have been saved before performing this call if alive. > Some comments in itable stub[1] can also help to make it clear, and in vtable stub[2] it seems we also use a t2 as tmp. > Would this be okay for you :-)? > > [1] https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L177-L178 > [2] https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L84 > > @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? > > Thank you. I re-examined this patch, and they seem okay to me again. For the t2 usage in this patch, we may come to the UEP via the `call_stub` when leaving the C++ world, where gcc would help us save alive caller-saved registers, so we can use t2 here. Or, we come to the UEP from the Java world: t2 is SOC so it would have been saved before performing this call if alive. Some comments in itable stub[1] can also help to make it clear, and in vtable stub[2] it seems we also use a t2 as tmp. Would this be okay for you :-)? > > [1] > > https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L177-L178 > > > [2] > https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L84 > > @zhengxiaolinX Nice catch. Looks good, but I'm not sure that all temporary registers are used safely, especially where non-t0 is used? > > Thank you. I re-examined this patch, and they seem okay to me again. For the t2 usage in this patch, we may come to the UEP via the `call_stub` when leaving the C++ world, where gcc would help us save alive caller-saved registers, so we can use t2 here. Or, we come to the UEP from the Java world: t2 is SOC so it would have been saved before performing this call if alive. Some comments in itable stub[1] can also help to make it clear, and in vtable stub[2] it seems we also use a t2 as tmp. Would this be okay for you :-)? > > [1] > > https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L177-L178 > > > [2] > https://github.com/openjdk/jdk/blob/82cbfb5fb0db61f3f1d9f0ceeed20c1cf5474652/src/hotspot/cpu/riscv/vtableStubs_riscv.cpp#L84 It's ok for me, and thanks for the explaination. ------------- PR: https://git.openjdk.org/jdk/pull/11010 From eosterlund at openjdk.org Wed Nov 9 12:15:31 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 9 Nov 2022 12:15:31 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <yIaaqK4Eqwty8bn3dSjhEJXoKA3SNXM2pOO6NovgSho=.ccd162ae-5a19-4e17-ab3a-a05114a9a0ed@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <yIaaqK4Eqwty8bn3dSjhEJXoKA3SNXM2pOO6NovgSho=.ccd162ae-5a19-4e17-ab3a-a05114a9a0ed@github.com> Message-ID: <3znrZnIih9b9Y8_6jkcCV_TPZi5N4UNUTzqDAP67Mug=.f48d2593-8fdc-4119-87fc-e53ad59c5cdd@github.com> On Wed, 9 Nov 2022 10:01:20 GMT, Robbin Ehn <rehn at openjdk.org> wrote: > I think David have a point here, using e.g. non-gen ZGC returning from Fibonacci could be slowed-down. I assume gen-ZGC will only need the CMF once per safepoint? If so can we make it conditional per safepoint generation? > > Or make sure it is not a problem. I'm not quite sure what you mean by making it conditional per safepoint generation? ------------- PR: https://git.openjdk.org/jdk/pull/11042 From xlinzheng at openjdk.org Wed Nov 9 12:28:31 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 9 Nov 2022 12:28:31 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> Message-ID: <hgs_9HpFZ4hrIuvj9OhIHqszTWPvWz7pDX2tdqRKxLc=.d033a653-3b5a-41b2-8539-5780df5e0a06@github.com> On Tue, 8 Nov 2022 09:00:38 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> Please see the JBS issue for more crash details. >> >> To reproduce using a cross-compiled build: >> >> # dump one cds-nocoops.jsa >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version >> >> # reproduce >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ >> -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens >> >> >> `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. >> >> To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. >> >> This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. >> >> Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix as to comments Thank you for taking time to review this patch! Let's move on then. ------------- PR: https://git.openjdk.org/jdk/pull/11010 From rehn at openjdk.org Wed Nov 9 12:36:41 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Nov 2022 12:36:41 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <4ekORiTOzhjG-Y2cfKlPrT7b65TOz-jJxliD9t1uBjU=.2587ccf1-7e12-4cc3-8dac-adb5fad1c107@github.com> On Tue, 8 Nov 2022 16:19:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. Marked as reviewed by rehn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11042 From rehn at openjdk.org Wed Nov 9 12:36:41 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Nov 2022 12:36:41 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <3znrZnIih9b9Y8_6jkcCV_TPZi5N4UNUTzqDAP67Mug=.f48d2593-8fdc-4119-87fc-e53ad59c5cdd@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <yIaaqK4Eqwty8bn3dSjhEJXoKA3SNXM2pOO6NovgSho=.ccd162ae-5a19-4e17-ab3a-a05114a9a0ed@github.com> <3znrZnIih9b9Y8_6jkcCV_TPZi5N4UNUTzqDAP67Mug=.f48d2593-8fdc-4119-87fc-e53ad59c5cdd@github.com> Message-ID: <QqWbNSeUCN8McKIpJNkiJ3eT22wa4pohGfSUyqu8J1U=.d0219b4c-944e-4086-a6de-82606ce2f053@github.com> On Wed, 9 Nov 2022 12:11:46 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > > I think David have a point here, using e.g. non-gen ZGC returning from Fibonacci could be slowed-down. I assume gen-ZGC will only need the CMF once per safepoint? If so can we make it conditional per safepoint generation? > > Or make sure it is not a problem. > > I'm not quite sure what you mean by making it conditional per safepoint generation? It can mean a couple of things. My original thought was wrong, but we know that code stream oops only needs processing once between start of safepoint A and start of safepoint B. But different nmethod can be processed at different times. So one CFW per nmethod in such safepoint epoch would be the maximum of CFW we need. This seem to map very good with "processing_completed_acquire()" "start_processing -> on_safepoint()". You are saying this information doing a more fine grained CMF it not worth the trouble. I cannot directly say you are correct, assuming you are, looks good. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From mcimadamore at openjdk.org Wed Nov 9 12:57:00 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 9 Nov 2022 12:57:00 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v16] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <A7xNrmeNo7QAzd4Gk3jpF972G49QzYpAu1JzkBWvd-Q=.62dd27c9-39f6-4c4d-ae7c-1661324212d4@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with four additional commits since the last revision: - Merge pull request #15 from minborg/test Add @apiNote to package-info - Add @apiNote to package-info - Merge pull request #16 from minborg/fix-tests2 Fix failing tests - Fix failing tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/2d75f954..39521344 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=14-15 Stats: 11 lines in 3 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From ayang at openjdk.org Wed Nov 9 13:19:23 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Nov 2022 13:19:23 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <jzNgSnfX-G5q4lbbE4ykGzWZsfZf9vSJfdh9kP1Qjpo=.8c140522-b6c7-4b1e-b77f-22467e5a3bd8@github.com> On Fri, 4 Nov 2022 14:46:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas If the same technique can be used by Serial/Parallel (I believe so), I'd prefer sth more generic, `_claim_stw_fullgc_mark/adjust`. (I am surprised that `_claim_finalizable` is used only by ZGC -- this essentially mirrors finalizable/strong marking, needed for conc ref-processing.) ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10989 From mcimadamore at openjdk.org Wed Nov 9 13:24:54 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 9 Nov 2022 13:24:54 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v17] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <Y6p315Q6Xv7ZWYDz0yMkyJxjYVxz6Iomu5phcti2KJE=.5f20042a-4d70-4dad-88ba-2883e1f8ab5f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Tweak Arena::close javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/39521344..cd3fbe7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=15-16 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Wed Nov 9 14:20:57 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Nov 2022 14:20:57 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v4] In-Reply-To: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: <9hoxdCjZvhZnTe3MxdabVMelQ0Tsghi3E3i6lHe6l5g=.74b17dd8-afb3-43f8-bd26-34d4d26c7373@github.com> > Background to this patch: > > This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. > > PR RFC: > > HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: > > MetaspaceObj - allocates in the Metaspace > CHeap - uses malloc > ResourceObj - ... > > The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. > > This is IMHO misleading, and often leads to confusion among HotSpot developers. > > I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. > > In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. > > The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. > > The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Fix after merge - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types - Remove riscv empty destructors - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types - Work around gtest exception compilation issues - Fix Shenandoah - Remove AnyObj new operator taking an allocation_type - Use more specific allocation types ------------- Changes: https://git.openjdk.org/jdk/pull/10745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10745&range=03 Stats: 497 lines in 164 files changed: 82 ins; 53 del; 362 mod Patch: https://git.openjdk.org/jdk/pull/10745.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10745/head:pull/10745 PR: https://git.openjdk.org/jdk/pull/10745 From stefank at openjdk.org Wed Nov 9 14:21:02 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Nov 2022 14:21:02 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: <p9yT4iQ_9jf2UlD-ILu6l7_uD7YDFGlWoCp8uw9buHI=.4702a6ae-9a7e-4ca4-a23f-ca99ee3b01e2@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> <p9yT4iQ_9jf2UlD-ILu6l7_uD7YDFGlWoCp8uw9buHI=.4702a6ae-9a7e-4ca4-a23f-ca99ee3b01e2@github.com> Message-ID: <KB_QRiLeiLBKRQnc0pkuHCKKu9RrqwHBAn4M9Irw_-I=.2eb841b1-9454-4828-800f-5bdacaa4538b@github.com> On Fri, 21 Oct 2022 10:25:03 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types Heads-up: I've removed the virtual destructors from risc-v, merged with latest, and started a tier1-7 test run. If this passes our testing then I intend to push the proposed changes. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From jbhateja at openjdk.org Wed Nov 9 15:59:27 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Nov 2022 15:59:27 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> Message-ID: <jzdvvRS7lq0nJaImLvNtUm10ME5ZG5zIJlv6hZBe9oE=.9129c20d-5222-49f9-ac3f-bfa6142f4ce8@github.com> On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix 32-bit build src/hotspot/cpu/x86/vm_version_x86.cpp line 1181: > 1179: #ifdef _LP64 > 1180: if (supports_avx512ifma() & supports_avx512vlbw()) { > 1181: if (FLAG_IS_DEFAULT(UsePolyIntrinsics)) { MaxVectorSize > 32 can be added along with feature checks your code mainly uses ZMMs ------------- PR: https://git.openjdk.org/jdk/pull/10582 From tschatzl at openjdk.org Wed Nov 9 16:12:59 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Nov 2022 16:12:59 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs In-Reply-To: <jzNgSnfX-G5q4lbbE4ykGzWZsfZf9vSJfdh9kP1Qjpo=.8c140522-b6c7-4b1e-b77f-22467e5a3bd8@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> <jzNgSnfX-G5q4lbbE4ykGzWZsfZf9vSJfdh9kP1Qjpo=.8c140522-b6c7-4b1e-b77f-22467e5a3bd8@github.com> Message-ID: <KlLq1d3F9A9Mn-5Lg_KX-Yixb-QwLcW3mclMFi_avIw=.2a557e45-fc56-4e09-b132-435d543162f1@github.com> On Wed, 9 Nov 2022 13:17:03 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Hi all, >> >> can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. >> >> Some comments: >> * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. >> * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > If the same technique can be used by Serial/Parallel (I believe so), I'd prefer sth more generic, `_claim_stw_fullgc_mark/adjust`. > > (I am surprised that `_claim_finalizable` is used only by ZGC -- this essentially mirrors finalizable/strong marking, needed for conc ref-processing.) @albertnetymk : I'll change the names according your suggestions and implement the optimization for serial/parallel gc too. Currently testing. ------------- PR: https://git.openjdk.org/jdk/pull/10989 From luhenry at openjdk.org Wed Nov 9 17:02:16 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 9 Nov 2022 17:02:16 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v7] In-Reply-To: <c1nZeXh-1Miu710qHFX206uWEZMpjvfrLP7I25w2TKs=.77412776-afcb-416d-8a2e-b73be619236f@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <USqZ-hj48jmO86DWscqbw3IxSfhfVKO1V_TBJ7SJVqc=.d42b651d-9191-4134-a60f-01f7f3093c57@github.com> <c1nZeXh-1Miu710qHFX206uWEZMpjvfrLP7I25w2TKs=.77412776-afcb-416d-8a2e-b73be619236f@github.com> Message-ID: <4_6nSpcKJKVkg8zdPx-u-SPIB7AFVx_9e2KZS_jjbaY=.6f8205db-81a0-4a5a-ad37-1711ff522234@github.com> On Wed, 9 Nov 2022 02:35:30 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> support large disp values > > src/hotspot/cpu/riscv/riscv.ad line 5197: > >> 5195: >> 5196: ins_encode %{ >> 5197: if ((($mem$$disp) & 0x1f) == 0) { > > Does this branch only check the alignment, not the allowed range? Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Wed Nov 9 17:02:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 9 Nov 2022 17:02:13 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v8] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <mKll40RxkRgJqFx273bfgeNaylWIT6GqjA8rxMuJ9tU=.4b9b70e5-d36d-4d83-a78c-18dfe84d8da1@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/42a61aa8..0e92909e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=06-07 Stats: 8 lines in 1 file changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From tschatzl at openjdk.org Wed Nov 9 17:30:47 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Nov 2022 17:30:47 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs [v2] In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <l1JcENUN2vXgRreobH9lPAdE55Kb0VNVRHlYLsQ8qIo=.0533dc50-6157-463b-87e4-f5798d300e48@github.com> > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: rename claim bits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10989/files - new: https://git.openjdk.org/jdk/pull/10989/files/8631a235..48c96fc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10989&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10989&range=00-01 Stats: 21 lines in 6 files changed: 5 ins; 3 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10989.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10989/head:pull/10989 PR: https://git.openjdk.org/jdk/pull/10989 From tschatzl at openjdk.org Wed Nov 9 17:30:47 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Nov 2022 17:30:47 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <ylLFhT8sQXHTFPytsmCOZYiwbIrjeFq7uzfsgcz_-3w=.bbdbc2cb-f67d-4e1d-9310-d34b66bdaa7d@github.com> On Fri, 4 Nov 2022 14:46:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas I decided to just do the rename and do serial/parallel support using this technique in a follow-up CR. I also explicitly made the g1 concurrent mark claim closures use `_claim_strong` so that (I think) the code is more clear. ------------- PR: https://git.openjdk.org/jdk/pull/10989 From duke at openjdk.org Wed Nov 9 17:53:21 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 17:53:21 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> Message-ID: <68kLqa_FyCKuyBqSpXlq_1QnR8I-HFL1aFr4uQ6DyoM=.817059a2-8945-44e7-9ce8-a5f5ee5a50c5@github.com> On Wed, 9 Nov 2022 00:10:48 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/share/opto/library_call.cpp line 7014: > >> 7012: const TypeKlassPtr* rklass = TypeKlassPtr::make(instklass_ImmutableElement); >> 7013: const TypeOopPtr* rtype = rklass->as_instance_type()->cast_to_ptr_type(TypePtr::NotNull); >> 7014: Node* rObj = new CheckCastPPNode(control(), rFace, rtype); > > FTR it's an unsafe cast since it doesn't involve a runtime check from `IntegerModuloP` to `ImmutableElement`. Please, lift as much checks into Java wrapper as possible. @iwanowww just to save some of your time... Sandhya suggested another way to move the checks to Java, hopefully without too much penalty to non-intrinsic path. Should upload it later today. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jvernee at openjdk.org Wed Nov 9 18:16:59 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 9 Nov 2022 18:16:59 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v2] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <dMQ238rubrgCcyRv-456L2yibOvwrz1gWFPUOQTM5TQ=.31dcad26-a466-4806-ab72-33eb1c89a6ef@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - Work around x86 failures - Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/a40080e6..9e13922d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=00-01 Stats: 19 lines in 8 files changed: 8 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From dlong at openjdk.org Wed Nov 9 20:37:33 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Nov 2022 20:37:33 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <7oWZyGskpLZF66PtHsUkBIHq1WHXhMzPlZ5uARjqXpc=.7b5d5c21-3f22-4bf9-86ac-d35c39dee0f7@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) We could also consider wrapping error reporting steps with an exception handler, rather than spending time tracking down every possible way they could crash. So instead of entering error reporting recursively, we back-track, skip the current step or sub-step, and continue. We lose information about why that step crashed, however. The "exception handler" could be implemented with something like sigsetjmp/siglongjmp. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From duke at openjdk.org Wed Nov 9 21:49:09 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 21:49:09 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> Message-ID: <Q_0DbDYDLyHauXZO42I4-XZq0gig6-cSGDKNpN4qQWU=.afa82e7b-1ab9-4214-a1db-2bf8d7958d41@github.com> On Wed, 9 Nov 2022 00:23:21 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 970: > >> 968: >> 969: void addmq(int disp, Register r1, Register r2); >> 970: > > Leftover formatting changes. done > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 95: > >> 93: >> 94: // OFFSET 64: mask_44 >> 95: 0xfffffffffff, 0xfffffffffff, > > Please, keep leading zeroes explicit in the constants. done. Also split things up and added ExternalAddress version of instructions. > src/hotspot/cpu/x86/stubRoutines_x86.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2013, 2022, Oracle and/or its affiliates. All rights reserved. > > No changes in the file anymore. done > src/hotspot/share/opto/library_call.cpp line 7014: > >> 7012: const TypeKlassPtr* rklass = TypeKlassPtr::make(instklass_ImmutableElement); >> 7013: const TypeOopPtr* rtype = rklass->as_instance_type()->cast_to_ptr_type(TypePtr::NotNull); >> 7014: Node* rObj = new CheckCastPPNode(control(), rFace, rtype); > > FTR it's an unsafe cast since it doesn't involve a runtime check from `IntegerModuloP` to `ImmutableElement`. Please, lift as much checks into Java wrapper as possible. @iwanowww Please have a look, just pushed a different way to fetch the limbs. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 9 21:48:59 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 21:48:59 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v10] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <evP4X-nr21v4n0RcBZNOTast7N2BH82V3qfekaSCFP0=.dbe1d26e-1950-4c5d-8a0e-85a0a85c8aa8@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: add getLimbs to interface and reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/da560452..8b1b40f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=08-09 Stats: 235 lines in 11 files changed: 103 ins; 79 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 9 21:49:10 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 21:49:10 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <jzdvvRS7lq0nJaImLvNtUm10ME5ZG5zIJlv6hZBe9oE=.9129c20d-5222-49f9-ac3f-bfa6142f4ce8@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> <jzdvvRS7lq0nJaImLvNtUm10ME5ZG5zIJlv6hZBe9oE=.9129c20d-5222-49f9-ac3f-bfa6142f4ce8@github.com> Message-ID: <o7sZchJ_G720V14ob7l7gSAdxajqZR8yOl6B-aJ202U=.13bfb649-ff67-432c-929f-e0e4e5812d90@github.com> On Wed, 9 Nov 2022 15:55:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1181: > >> 1179: #ifdef _LP64 >> 1180: if (supports_avx512ifma() & supports_avx512vlbw()) { >> 1181: if (FLAG_IS_DEFAULT(UsePolyIntrinsics)) { > > MaxVectorSize > 32 can be added along with feature checks your code mainly uses ZMMs done (`MaxVectorSize >= 64`) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From ayang at openjdk.org Wed Nov 9 21:53:37 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Nov 2022 21:53:37 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs [v2] In-Reply-To: <l1JcENUN2vXgRreobH9lPAdE55Kb0VNVRHlYLsQ8qIo=.0533dc50-6157-463b-87e4-f5798d300e48@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> <l1JcENUN2vXgRreobH9lPAdE55Kb0VNVRHlYLsQ8qIo=.0533dc50-6157-463b-87e4-f5798d300e48@github.com> Message-ID: <RI8Bg5O5gNHuW7E6TQX5gmKO2Wteksn811DNUCiChuk=.63dfd4c5-8885-4b18-a594-b77d31dda210@github.com> On Wed, 9 Nov 2022 17:30:47 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: >> Hi all, >> >> can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. >> >> Some comments: >> * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. >> * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > rename claim bits src/hotspot/share/classfile/classLoaderData.hpp line 209: > 207: _claim_strong = 3, > 208: _claim_strong_stw_fullgc_mark = 4, > 209: _claim_strong_stw_fullgc_adjust = 8, I feel having `strong` in their names can be misleading -- there are no "finalizable" counterparts for them; IOW, there is only one kind of strength for `*_mark/adjust`. What do others think? ------------- PR: https://git.openjdk.org/jdk/pull/10989 From duke at openjdk.org Wed Nov 9 21:57:46 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 21:57:46 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9] In-Reply-To: <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <Okq3ER5gNRyGhWX8ZRWmxDr0hyzs0DJ15EHul8xoloA=.6c7980ce-89da-4efa-80af-83e1da66945b@github.com> <cqMESA9bq6nErMs9ckafTqczQEz33XgEVax014m0hTM=.26c6bdfe-afb6-4406-941b-fc2ce1030389@github.com> Message-ID: <9VOSld7kTyK9X5jTVkY2Dm_7CVOdZlHzOcoXSF8iLG4=.0fb02250-253e-4c1a-9c4e-b7e147c3e2b2@github.com> On Tue, 8 Nov 2022 23:59:42 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32-bit build > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > >> 173: >> 174: int blockMultipleLength = len & (~(BLOCK_LENGTH-1)); >> 175: Objects.checkFromIndexSize(offset, blockMultipleLength, input.length); > > I suggest to move the checks into `processMultipleBlocks`, introduce new static helper method specifically for the intrinsic part, and lift more logic (e.g., field loads) from the intrinsic into Java code. > > As an additional step, you can switch to double-register addressing mode (base + offset) for input data (`input`, `alimbs`, `rlimbs`) and simplify the intrinsic part even more (will involve a switch from `array_element_address` to `make_unsafe_address`). `array_element_address` vs `make_unsafe_address`. Don't know that I understood.. but going to guess :) "It might be cleaner to encode base+offset into the instruction opcode, save some `lea`s" I think that ship has 'sailed'? - `input`: I went and removed `offset` from intrinsic stub parameter list and instead passed it to `array_element_address`. But also, because I was really running out of GPRs, I had to do a `lea` before that at the function entry. Can't keep the offset register free for encoding.. - `alimbs`: offset already 0. Also, I mostly keep the actual value `a2:a1:a0` around. Just need address to write result back out. - `rlimbs`: offset already 0 and address itself discarded right after loading the R value into 2 GPRs. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 9 22:00:40 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 9 Nov 2022 22:00:40 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: <EsOqIOc_ALMLS4KMPlFF7-_NDw8AkVUlla194C8zERY=.699e46f8-7c6c-4f4e-95d3-04ad0e3d13e2@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <hX0rCTuGOWiCCGiKC7anEoiNosHutI1kiWb9Z-plUSY=.d7a8c833-f3dd-4f3a-98bf-1975109de48c@github.com> <lyp-yeLzhgxWaDGBB7GvJPgtMn5u_1k823-GZLIZPKM=.fbdc9545-5e30-4919-880c-f377b13d31ee@github.com> <xVrQBy6-ShiXGeObsxEoYonfCQ8r7f7VebUjJI1zP64=.5ffcb66d-5662-4fd0-8b51-7ca621a69757@github.com> <hFUaLY8cUCfXRbJCKtDcefgrwZvl2X53q1QPVUEosqc=.7980aa45-d193-49cc-b3f1-15a8eb45576e@github.com> <asa1G1rY6oVsgFHuXITPmNXCuryU8b5vJRNj8RMnZng=.f5f4e431-48c2-4896-998a-fb418992b581@github.com> <EsOqIOc_ALMLS4KMPlFF7-_NDw8AkVUlla194C8zERY=.699e46f8-7c6c-4f4e-95d3-04ad0e3d13e2@github.com> Message-ID: <k88mxyScYuDY8wmwiPM0Ey8o97ayPq-1qZvdXpRxei8=.b5e6c854-6669-4ee8-a6b8-212da546a672@github.com> On Wed, 9 Nov 2022 02:19:29 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >>> Did not split it up into individual constants. The main 'problem' is that Address and ExternalAddress are not compatible. >> >> There's a reason for that and it's because RIP-relative addressing doesn't always work, so additional register may be needed. >> >>> Most instructions do not take AddressLiteral, so can't use ExternalAddress to refer to those constants. >> >> I counted 4 instructions accessing the constants (`evpandq`, `andq`, `evporq`, and `vpternlogq`) in your patch. >> >> `macroAssembler_x86.hpp` is the place for `AddressLiteral`-related overloads (there are already numerous cases present) and it's trivial to add new ones. >> >>> (If I did get the instructions I use to take AddressLiteral, I think we would end up with more lea(rscratch)s generated; but that's more of a silver-lining) >> >> It depends on memory layout. If constants end up placed close enough in the address space, there'll be no additional instructions generated. >> >> Anyway, it doesn't look like something important from throughput perspective. Overall, I find it clearer when the code refers to individual constants through `AddressLiteral`s, but I'm also fine with it as it is now. > > Makes sense to me, that would indeed be cleaner, will add a couple more overloads. (Still getting used to what is 'clean' in this code base). done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 10 01:22:04 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 10 Nov 2022 01:22:04 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: fix windows and 32b linux builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/8b1b40f7..abfc68f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=09-10 Stats: 5 lines in 3 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Thu Nov 10 01:32:54 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Nov 2022 01:32:54 GMT Subject: RFR: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null [v2] In-Reply-To: <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> <PewIAA9YS84tV-xZpxn2DATcLbZ4W5wUaviBrvDrADE=.a8c697dc-c80e-421c-a2b5-36e58d24b8ce@github.com> Message-ID: <lu5wH_ZN4BL2VCkMXfoxPsamOptGqqTg76HXbvGyyJw=.595b1972-7c42-459e-a0e8-6d852483ec04@github.com> On Tue, 8 Nov 2022 09:00:38 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> Please see the JBS issue for more crash details. >> >> To reproduce using a cross-compiled build: >> >> # dump one cds-nocoops.jsa >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version >> >> # reproduce >> <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ >> -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens >> >> >> `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. >> >> To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. >> >> This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. >> >> Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix as to comments I also tested this fix running non-trivial benchmark workloads like Renaissance, SPECjvm2008, SPECjbb2015 with both release & fastdebug builds. So I think it should be safe to integrate this. ------------- PR: https://git.openjdk.org/jdk/pull/11010 From xlinzheng at openjdk.org Thu Nov 10 01:34:46 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 10 Nov 2022 01:34:46 GMT Subject: Integrated: 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null In-Reply-To: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> References: <7NJqWAajnAuuq1Udf6GT8JeGZdNgBxWGASX0P8HhZE8=.9e7f7b23-3f6a-4954-91a0-d6a7ac123319@github.com> Message-ID: <V2zrs_CX-jWP-KYzHUBusaugZPhKUt5-y5MRrUiQNuI=.adc77196-5ac1-44d5-8a68-9c6003b300ce@github.com> On Mon, 7 Nov 2022 04:03:57 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > Please see the JBS issue for more crash details. > > To reproduce using a cross-compiled build: > > # dump one cds-nocoops.jsa > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:dump -Xlog:cds* -version > > # reproduce > <java> -XX:-UseCompressedOops -XX:+UseCompressedClassPointers -Xshare:on -XX:-TieredCompilation \ > -Xlog:cds* -Xlog:gc+metaspace=info -jar renaissance-gpl-0.14.1.jar -r 1 movie-lens > > > `MacroAssembler::en/decode_klass_not_null` uses the heapbase register as a temp register in the interpreter, which may kill the in-use value when enabling C2 compilation and `UseCompressedClassPointers` meanwhile disabling `UseCompressedOops`. C1 won't have this issue for the xheapbase is not its allocation candidate. When CDS is enabled, the narrow klass base is mapped to some address like `0x0000000800000000`, so `MacroAssembler::decode_klass_not_null`, which lacks registers, will use `xheapbase` as a temp to load the klass base and kill the register in the interpreter. So adding a `-XX:+DeoptimizeALot` can speedily reproduce the issue. > > To solve this, we shall decouple the xheapbase used as a temp register in `MacroAssembler::en/decode_klass_not_null`. AArch64 has advanced instructions so one register is enough to handle the logic. But in RISC-V we require at least two. > > This patch introduces another argument `tmp` to related functions to decouple and eliminate such heapbase usages. One thing that deserves noticing is the `cmp_klass` case, which usually gets used at the beginning of a method entry when `t1` is used as ic holder klass and `t0` is occupied there. These positions are special since nearly all registers are usable except ones used for arguments and special purposes (thread register, etc.). I propose to use a call-clobbered `t2` register here, to keep aligning the `i2c2i_adapter` logic[1]. > > Tested hotspot tier1~4 on QEMU; jdk tier1~tier2 and hotspot tier1~tier2 on my Hifive unmatched board, and the reproducible movie-lens benchmark. > > Thanks, > Xiaolin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp#L629 This pull request has now been integrated. Changeset: 93fed9b2 Author: Xiaolin Zheng <xlinzheng at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/93fed9b251c21f20b68ddc4e179d6595275dbcd2 Stats: 52 lines in 8 files changed: 7 ins; 6 del; 39 mod 8296448: RISC-V: Fix temp usages of heapbase register killed by MacroAssembler::en/decode_klass_not_null Reviewed-by: fyang, yadongwang ------------- PR: https://git.openjdk.org/jdk/pull/11010 From yadongwang at openjdk.org Thu Nov 10 02:47:35 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Thu, 10 Nov 2022 02:47:35 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v8] In-Reply-To: <mKll40RxkRgJqFx273bfgeNaylWIT6GqjA8rxMuJ9tU=.4b9b70e5-d36d-4d83-a78c-18dfe84d8da1@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <mKll40RxkRgJqFx273bfgeNaylWIT6GqjA8rxMuJ9tU=.4b9b70e5-d36d-4d83-a78c-18dfe84d8da1@github.com> Message-ID: <FmXAPMTywS_XDax4dGwlSVfKvb6hQYNFJowRUIUwFeM=.ac2f974a-7a6f-4ab4-b441-a71f228e6ecb@github.com> On Wed, 9 Nov 2022 17:02:13 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > review lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10884 From duke at openjdk.org Thu Nov 10 03:09:43 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 10 Nov 2022 03:09:43 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11] In-Reply-To: <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> Message-ID: <9Ststt1zBbU04qp9Ilb7zPQx3bA5uIQEi-TtbpiMn1s=.01700387-0a3b-4f47-9daa-1febc7230539@github.com> On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix windows and 32b linux builds Revised numbers with `getLimbs()` interface change. Compared to previous version that got limbs in IR, change is within deviation.. (mostly -1%) datasize | master | optimized | disabled | opt/mst | dis/mst -- | -- | -- | -- | -- | -- 32 | 3218169 | 3651078 | 3116558 | 1.13 | 0.97 64 | 2858030 | 3407518 | 2824903 | 1.19 | 0.99 128 | 2396796 | 3357224 | 2394802 | 1.40 | 1.00 256 | 1780679 | 3050142 | 1751130 | 1.71 | 0.98 512 | 1168824 | 2938952 | 1148479 | 2.51 | 0.98 1024 | 648772.1 | 2728454 | 687016.7 | 4.21 | 1.06 2048 | 357009 | 2393507 | 392928.2 | 6.70 | 1.10 16384 | 48854.33 | 903175.4 | 52874.78 | 18.49 | 1.08 1048576 | 771.461 | 14951.24 | 840.792 | 19.38 | 1.09 ------------- PR: https://git.openjdk.org/jdk/pull/10582 From yadongwang at openjdk.org Thu Nov 10 03:24:16 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Thu, 10 Nov 2022 03:24:16 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V Message-ID: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. AArch64 tier1, riscv hotspot & jdk tier1 have been tested. Additional cases with dtrace tested on AArch64: test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java test/hotspot/jtreg/compiler/runtime/Test8168712.java ------------- Commit messages: - 8296630: Fix SkipIfEqual on AArch64 and RISC-V Changes: https://git.openjdk.org/jdk/pull/11076/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11076&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296630 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11076.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11076/head:pull/11076 PR: https://git.openjdk.org/jdk/pull/11076 From dholmes at openjdk.org Thu Nov 10 04:55:33 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Nov 2022 04:55:33 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <EJZU_tuTd44fTZYHHxarQKdlY7GY66OwYEXwnyMCvDE=.ef2ef916-cd85-4bae-8e81-68bd8d341ac5@github.com> On Tue, 8 Nov 2022 16:19:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. > Hope this explanation makes sense Yes thanks. I'm willing to let this ride and see if there are any issues later. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11042 From dholmes at openjdk.org Thu Nov 10 04:55:34 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Nov 2022 04:55:34 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <uUYptwQe55wV_M5isGTs_xMUKFJ6lJUufFoUmJJ5y44=.ced3b2cd-021a-43ee-817a-80354b09c639@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> <uUYptwQe55wV_M5isGTs_xMUKFJ6lJUufFoUmJJ5y44=.ced3b2cd-021a-43ee-817a-80354b09c639@github.com> Message-ID: <ppll_2abzQxtuvoS6-7EbsmIDAL7YxFU21CTrS7Z6gE=.6d2723ba-2036-4964-83d8-dd56ea4f1714@github.com> On Wed, 9 Nov 2022 10:42:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> src/hotspot/share/runtime/safepointMechanism.cpp line 108: >> >>> 106: if (prev_poll_word != poll_word || >>> 107: prev_poll_word == _poll_word_armed_value) { >>> 108: // After updating the poll value, we allow entering new nmethods >> >> I'm a little confused about the positioning here. The comment says "after updating the poll value", but we haven't updated yet (happens below) so don't we need the fence after that point? > > The important thing is that it is called after the handshake/safepoint/stack watermark operation. The wording is just unfortunate. Maybe I should change it to "While updating the poll value" to be less confusing? Does that read better? Yes "while" reads better - thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From dholmes at openjdk.org Thu Nov 10 04:55:34 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Nov 2022 04:55:34 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <qGmHOeXEV4B1b1nNVm-uilJvmyoOA5BJsybw418puOQ=.7c2e0cb2-7dc0-4b60-8009-1ef4345c3c9c@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <XZwERqvcDEA9iyhsPM5LljfPSocmFKRiQ_cHTYnBULY=.9647349b-94f3-4be7-8b36-1b0facfe8639@github.com> <rAyPuMDllxJwFOd4d-a9Cbo_MjkmqdirKbWXFaLfZKs=.8f33ca2d-1877-4667-a45c-ab72d426cc3d@github.com> <qGmHOeXEV4B1b1nNVm-uilJvmyoOA5BJsybw418puOQ=.7c2e0cb2-7dc0-4b60-8009-1ef4345c3c9c@github.com> Message-ID: <VBY99N_iFPMjhWgOCNQcXpzMMO1mHt6hKh9Rmb_JuYo=.4e30e53f-733d-4ec4-a56b-d3eb1d520724@github.com> On Wed, 9 Nov 2022 10:41:04 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The new logic inside of update_poll_values makes is more accurate and made this cross_modify_fence redundant. That's why I removed it. > >> Has this simply been moved to cover more paths? > > I removed it because it is redundant after updating update_poll_values to more precisely identify when cross_modify_fence should be called. It will then already have been called inside of update_poll_values, when it was needed here at the call site. Okay. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From dholmes at openjdk.org Thu Nov 10 04:59:38 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Nov 2022 04:59:38 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v4] In-Reply-To: <9hoxdCjZvhZnTe3MxdabVMelQ0Tsghi3E3i6lHe6l5g=.74b17dd8-afb3-43f8-bd26-34d4d26c7373@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> <9hoxdCjZvhZnTe3MxdabVMelQ0Tsghi3E3i6lHe6l5g=.74b17dd8-afb3-43f8-bd26-34d4d26c7373@github.com> Message-ID: <lOsQyQmeGPXEuQP1LkfUTyi0SKHski67ihvPT2NFmUk=.12769ba3-8ceb-418e-9934-78bd92202331@github.com> On Wed, 9 Nov 2022 14:20:57 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Fix after merge > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Remove riscv empty destructors > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10745 From fyang at openjdk.org Thu Nov 10 05:16:36 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Nov 2022 05:16:36 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v8] In-Reply-To: <mKll40RxkRgJqFx273bfgeNaylWIT6GqjA8rxMuJ9tU=.4b9b70e5-d36d-4d83-a78c-18dfe84d8da1@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <mKll40RxkRgJqFx273bfgeNaylWIT6GqjA8rxMuJ9tU=.4b9b70e5-d36d-4d83-a78c-18dfe84d8da1@github.com> Message-ID: <3FuUm6ceXWdbjFPxeCOtp9iJGBL-k8_HPsftiecXcxY=.1a40d37d-27d2-47f4-a6e3-b0494027e68f@github.com> On Wed, 9 Nov 2022 17:02:13 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1920: > 1918: if (is_imm_in_range(imm, 12, 0)) { > 1919: and_imm12(Rd, Rn, imm); > 1920: } else if (is_power_of_2(~(imm - 1))) { Since this has been reworked, is this change still necessary then? I assume this won't make a difference in number of instructions emitted as compared with the final else branch? ------------- PR: https://git.openjdk.org/jdk/pull/10884 From stuefe at openjdk.org Thu Nov 10 05:46:20 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 10 Nov 2022 05:46:20 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <7oWZyGskpLZF66PtHsUkBIHq1WHXhMzPlZ5uARjqXpc=.7b5d5c21-3f22-4bf9-86ac-d35c39dee0f7@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <7oWZyGskpLZF66PtHsUkBIHq1WHXhMzPlZ5uARjqXpc=.7b5d5c21-3f22-4bf9-86ac-d35c39dee0f7@github.com> Message-ID: <uu_zW58400fwJISnzECyc24LfWIdfIHDpgX2tvg2AUA=.40289e23-7a83-4937-a396-ffb0e965159b@github.com> On Wed, 9 Nov 2022 20:33:35 GMT, Dean Long <dlong at openjdk.org> wrote: > We could also consider wrapping error reporting steps with an exception handler, rather than spending time tracking down every possible way they could crash. So instead of entering error reporting recursively, we back-track, skip the current step or sub-step, and continue. We lose information about why that step crashed, however. The "exception handler" could be implemented with something like sigsetjmp/siglongjmp. I think the "never unwind stack" is a deliberate decision. If you unwind the stack, e.g. using longjmp, you risk errors in the follow up STEPs. E.g. by tearing ResourceMark chains. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From fyang at openjdk.org Thu Nov 10 08:12:29 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Nov 2022 08:12:29 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> Message-ID: <aA_8qld8C5G6zCBuwSqx_dHM3qWUlz9Iep-DHYl4Rl8=.6da2d031-7d32-49ce-a4c7-1aae49057990@github.com> On Wed, 9 Nov 2022 03:08:57 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: > In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. > > The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. > > $ java -XX:+PrintBytecodeHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 8101142 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 634592 7.83% dc fast_aload_0 > 471840 5.82% b6 invokevirtual > 376275 4.64% 2b aload_1 > 358520 4.43% e0 fast_iload > 332267 4.10% de fast_aaccess_0 > 270189 3.34% a7 goto > 249831 3.08% 19 aload > 223361 2.76% b9 invokeinterface > 215666 2.66% 1c iload_2 > 194877 2.41% b8 invokestatic > 192212 2.37% 2c aload_2 > 185826 2.29% 1b iload_1 > > $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 7627721 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 102673 1.346% 84 a7 iinc goto > 85429 1.120% dc 2b fast_aload_0 aload_1 > 84394 1.106% dc b6 fast_aload_0 invokevirtual > 73131 0.959% b7 dc invokespecial fast_aload_0 > 64605 0.847% 2b b6 aload_1 invokevirtual > 64086 0.840% dc b9 fast_aload_0 invokeinterface > 63663 0.835% b6 dc invokevirtual fast_aload_0 > 59946 0.786% b6 de invokevirtual fast_aaccess_0 > 56631 0.742% 36 e0 istore fast_iload > 51261 0.672% b9 de invokeinterface fast_aaccess_0 > 49556 0.650% 3a 19 astore aload > 49106 0.644% a7 e0 goto fast_iload src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1739: > 1737: void TemplateInterpreterGenerator::count_bytecode() { > 1738: __ mv(x7, (address) &BytecodeCounter::_counter_value); > 1739: __ atomic_addalw(noreg, 1, x7); I think a simpler __ atomic_addw will do here? I don't think there is any memory ordering issue here. ------------- PR: https://git.openjdk.org/jdk/pull/11051 From stefank at openjdk.org Thu Nov 10 08:35:14 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Nov 2022 08:35:14 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v4] In-Reply-To: <9hoxdCjZvhZnTe3MxdabVMelQ0Tsghi3E3i6lHe6l5g=.74b17dd8-afb3-43f8-bd26-34d4d26c7373@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> <9hoxdCjZvhZnTe3MxdabVMelQ0Tsghi3E3i6lHe6l5g=.74b17dd8-afb3-43f8-bd26-34d4d26c7373@github.com> Message-ID: <t8vSx3BJkvuiZAvO0LiXViL1p_E_GzuG7oXwvINaC8c=.a482dba1-cc6a-4193-99f4-3ba069b2ad81@github.com> On Wed, 9 Nov 2022 14:20:57 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Fix after merge > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Remove riscv empty destructors > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types Thanks all, for reviewing and the discussion! ------------- PR: https://git.openjdk.org/jdk/pull/10745 From stefank at openjdk.org Thu Nov 10 08:35:16 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Nov 2022 08:35:16 GMT Subject: Integrated: 8295475: Move non-resource allocation strategies out of ResourceObj In-Reply-To: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: <4YCBXDu5_4zTjFwluiIyKsAPVvk6nrkHIIwtAZGOL1s=.5e7e3ed7-b2e4-47e4-b3e8-4bdeed78ca75@github.com> On Tue, 18 Oct 2022 12:57:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Background to this patch: > > This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. > > PR RFC: > > HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: > > MetaspaceObj - allocates in the Metaspace > CHeap - uses malloc > ResourceObj - ... > > The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. > > This is IMHO misleading, and often leads to confusion among HotSpot developers. > > I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. > > In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. > > The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. > > The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. This pull request has now been integrated. Changeset: bfc58165 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/bfc58165952a1d51ad2bfce60963633f17ac43ec Stats: 497 lines in 164 files changed: 82 ins; 53 del; 362 mod 8295475: Move non-resource allocation strategies out of ResourceObj Reviewed-by: coleenp, stuefe, rehn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/10745 From tschatzl at openjdk.org Thu Nov 10 08:52:16 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Nov 2022 08:52:16 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs [v3] In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <rNeAcxnCtNbH1Tqen9_Yl7CkpcS_dP2_3fUGzeF27vg=.f5862170-24e7-47ba-a083-23964d02a2af@github.com> > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10989/files - new: https://git.openjdk.org/jdk/pull/10989/files/48c96fc4..88e4a990 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10989&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10989&range=01-02 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10989.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10989/head:pull/10989 PR: https://git.openjdk.org/jdk/pull/10989 From tschatzl at openjdk.org Thu Nov 10 08:52:19 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Nov 2022 08:52:19 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs [v2] In-Reply-To: <RI8Bg5O5gNHuW7E6TQX5gmKO2Wteksn811DNUCiChuk=.63dfd4c5-8885-4b18-a594-b77d31dda210@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> <l1JcENUN2vXgRreobH9lPAdE55Kb0VNVRHlYLsQ8qIo=.0533dc50-6157-463b-87e4-f5798d300e48@github.com> <RI8Bg5O5gNHuW7E6TQX5gmKO2Wteksn811DNUCiChuk=.63dfd4c5-8885-4b18-a594-b77d31dda210@github.com> Message-ID: <hjqy6pif3FK6e7z33-WQsIJ-OFOxKOXpgMmxS0Kmcmo=.73475382-2efd-4021-aec0-f417005eec8f@github.com> On Wed, 9 Nov 2022 21:51:32 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> rename claim bits > > src/hotspot/share/classfile/classLoaderData.hpp line 209: > >> 207: _claim_strong = 3, >> 208: _claim_strong_stw_fullgc_mark = 4, >> 209: _claim_strong_stw_fullgc_adjust = 8, > > I feel having `strong` in their names can be misleading -- there are no "finalizable" counterparts for them; IOW, there is only one kind of strength for `*_mark/adjust`. What do others think? Done. ------------- PR: https://git.openjdk.org/jdk/pull/10989 From ngasson at openjdk.org Thu Nov 10 09:12:27 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Thu, 10 Nov 2022 09:12:27 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <kQfqcd3WP6klCY_d0K73Y0emWg8CdlsFlwtFqB1oT4g=.6748ad9c-e3a4-410d-a809-0d4ed8a8e40a@github.com> On Thu, 10 Nov 2022 03:17:37 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11076 From stefank at openjdk.org Thu Nov 10 09:33:58 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Nov 2022 09:33:58 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap Message-ID: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. This is what the constructors look like before the patch: CHeapBitMap() : CHeapBitMap(mtInternal) {} explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); And I'd like to change it to: explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. We could probably condense this even further into just one constructor: explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. ------------- Commit messages: - 8296774: Removed default MEMFLAGS value from CHeapBitMap Changes: https://git.openjdk.org/jdk/pull/11084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296774 Stats: 20 lines in 13 files changed: 0 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/11084.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11084/head:pull/11084 PR: https://git.openjdk.org/jdk/pull/11084 From stefank at openjdk.org Thu Nov 10 10:15:11 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Nov 2022 10:15:11 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v2] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray ------------- Changes: https://git.openjdk.org/jdk/pull/11086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=01 Stats: 155 lines in 60 files changed: 24 ins; 6 del; 125 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Thu Nov 10 10:22:45 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Nov 2022 10:22:45 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v2] In-Reply-To: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> Message-ID: <isxb6GuvewA4RY9rOOYBDWqk4w0EkNH2aEVat1Sr_Ho=.b8eaa6c1-0f03-4d51-857d-7fdab154d374@github.com> > Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). > > For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. > > When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. > > This is what the constructors look like before the patch: > > CHeapBitMap() : CHeapBitMap(mtInternal) {} > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); > > > And I'd like to change it to: > > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); > > > In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. > > We could probably condense this even further into just one constructor: > > explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} > > > given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction - 8296774: Removed default MEMFLAGS value from CHeapBitMap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11084/files - new: https://git.openjdk.org/jdk/pull/11084/files/d83e5149..d1a3069b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11084&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11084&range=00-01 Stats: 506 lines in 165 files changed: 84 ins; 53 del; 369 mod Patch: https://git.openjdk.org/jdk/pull/11084.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11084/head:pull/11084 PR: https://git.openjdk.org/jdk/pull/11084 From luhenry at openjdk.org Thu Nov 10 11:01:48 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 10 Nov 2022 11:01:48 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v9] In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <kzh7eK-D3LFHAaTqksnjLvDbnMdaHEnxl4Ij6mru8NY=.65f95228-8e8e-46b6-85e2-209f84581496@github.com> > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10884/files - new: https://git.openjdk.org/jdk/pull/10884/files/0e92909e..3f70a21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=07-08 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From sspitsyn at openjdk.org Thu Nov 10 11:27:32 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 10 Nov 2022 11:27:32 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v2] In-Reply-To: <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> Message-ID: <ofVu3McQdAaZctaMopNVrQ8ZL84EHJeX4g8PPI-AkKk=.da0d60c9-123b-41fa-a906-e509c4fdfda6@github.com> On Thu, 10 Nov 2022 10:15:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray Marked as reviewed by sspitsyn (Reviewer). All the serviceability files look okay to me. Thanks, Serguei ------------- PR: https://git.openjdk.org/jdk/pull/11086 From yzhu at openjdk.org Thu Nov 10 12:36:36 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 10 Nov 2022 12:36:36 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> Message-ID: <XnhDHudl-5P-rb0pGwK5nGiZmgmu3mK7HcRl5XUGz3Q=.05d5e32c-6888-4ede-8c62-b01cb1efe398@github.com> > In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. > > The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. > > $ java -XX:+PrintBytecodeHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 8101142 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 634592 7.83% dc fast_aload_0 > 471840 5.82% b6 invokevirtual > 376275 4.64% 2b aload_1 > 358520 4.43% e0 fast_iload > 332267 4.10% de fast_aaccess_0 > 270189 3.34% a7 goto > 249831 3.08% 19 aload > 223361 2.76% b9 invokeinterface > 215666 2.66% 1c iload_2 > 194877 2.41% b8 invokestatic > 192212 2.37% 2c aload_2 > 185826 2.29% 1b iload_1 > > $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 7627721 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 102673 1.346% 84 a7 iinc goto > 85429 1.120% dc 2b fast_aload_0 aload_1 > 84394 1.106% dc b6 fast_aload_0 invokevirtual > 73131 0.959% b7 dc invokespecial fast_aload_0 > 64605 0.847% 2b b6 aload_1 invokevirtual > 64086 0.840% dc b9 fast_aload_0 invokeinterface > 63663 0.835% b6 dc invokevirtual fast_aload_0 > 59946 0.786% b6 de invokevirtual fast_aaccess_0 > 56631 0.742% 36 e0 istore fast_iload > 51261 0.672% b9 de invokeinterface fast_aaccess_0 > 49556 0.650% 3a 19 astore aload > 49106 0.644% a7 e0 goto fast_iload Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: replace atomic_addalw with atomic_addw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11051/files - new: https://git.openjdk.org/jdk/pull/11051/files/753a384d..26e70dfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11051&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11051&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11051.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11051/head:pull/11051 PR: https://git.openjdk.org/jdk/pull/11051 From yzhu at openjdk.org Thu Nov 10 12:36:37 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 10 Nov 2022 12:36:37 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: <aA_8qld8C5G6zCBuwSqx_dHM3qWUlz9Iep-DHYl4Rl8=.6da2d031-7d32-49ce-a4c7-1aae49057990@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> <aA_8qld8C5G6zCBuwSqx_dHM3qWUlz9Iep-DHYl4Rl8=.6da2d031-7d32-49ce-a4c7-1aae49057990@github.com> Message-ID: <tEv2mrQsOcK64PE0l58AErhJNo1uQyYkavWJHCkNi5Q=.f2d38e4b-9f37-49a8-910d-3bee3404071f@github.com> On Thu, 10 Nov 2022 08:08:49 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> replace atomic_addalw with atomic_addw > > src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1739: > >> 1737: void TemplateInterpreterGenerator::count_bytecode() { >> 1738: __ mv(x7, (address) &BytecodeCounter::_counter_value); >> 1739: __ atomic_addalw(noreg, 1, x7); > > I think a simpler __ atomic_addw will do here? I don't think there is any memory ordering issue here. Thank you for your review. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11051 From fyang at openjdk.org Thu Nov 10 13:09:29 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Nov 2022 13:09:29 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v9] In-Reply-To: <kzh7eK-D3LFHAaTqksnjLvDbnMdaHEnxl4Ij6mru8NY=.65f95228-8e8e-46b6-85e2-209f84581496@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> <kzh7eK-D3LFHAaTqksnjLvDbnMdaHEnxl4Ij6mru8NY=.65f95228-8e8e-46b6-85e2-209f84581496@github.com> Message-ID: <NlwbLR_1HS53YyPVjAGBCSDe_MpIvuafRbEeAOlVtCU=.d4945a6a-34e7-4c63-9208-f5bf68202d22@github.com> On Thu, 10 Nov 2022 11:01:48 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. >> >> It passes `hotspot:tier1` test suite > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by fyang (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10884 From aboldtch at openjdk.org Thu Nov 10 13:29:30 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 10 Nov 2022 13:29:30 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <ZJRBAQREVp5EPW0aG1QT0BUA1nYAwsAMOQZBWSOj_hI=.cb00d69e-b8cf-4c0a-b3a5-33299287ee33@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <ZJRBAQREVp5EPW0aG1QT0BUA1nYAwsAMOQZBWSOj_hI=.cb00d69e-b8cf-4c0a-b3a5-33299287ee33@github.com> Message-ID: <Y3-LqsM6I1-2wr2Ac_f1OMe1w4O7_DrbaEnaWJfu_-I=.6119349e-ce23-45f8-a113-80b3b8d52252@github.com> On Tue, 8 Nov 2022 12:54:00 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > Ok, I was afraid to look at this. I do like STEP_IF. Maybe the longer expressions could be functions ? Are you think of the conditions in the STEP_IF macro? I think at least some of the longer expressions can be put on multiple lines. I guess it is already done for `should_report_bug`. I think verbose should always be seen in the macro, but a `should_perform_step` might be better, but I feel it might just be a noisy indirection when reading (as all the conditions are fairly simple). I should however fix the implicit bool conversions, they are both agains our style guide, and ugly. For the step logic I think there are a few places that things can be refactored without losing the ability of easily comparing the VMError::report code to the hs_err file output. ------------- PR: https://git.openjdk.org/jdk/pull/11018 From fyang at openjdk.org Thu Nov 10 13:39:37 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Nov 2022 13:39:37 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <JsNWjWF2kjspuhIGKKTXG2SFV4Fh6u-vXXEMgSoXgcQ=.24ad756c-921e-462a-bf59-0529e61e26d4@github.com> On Thu, 10 Nov 2022 03:17:37 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11076 From luhenry at openjdk.org Thu Nov 10 13:41:22 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 10 Nov 2022 13:41:22 GMT Subject: Integrated: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> References: <mRdbSbte8DTjdvh_o3eiuLbG4O6txcSShEBFnLBjpLs=.ea7f0919-3690-4311-b7ec-8a58626cba96@github.com> Message-ID: <gpx3WxabJvDT-ofT1iy4Dk9Z4_cqXV5moBOUa3YEMTY=.c57d5a02-c567-419d-9432-4b3db6e189fa@github.com> On Thu, 27 Oct 2022 15:18:02 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite This pull request has now been integrated. Changeset: 4465361e Author: Ludovic Henry <luhenry at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/4465361ee9dff1ab6532f343318665b7e50c166e Stats: 84 lines in 3 files changed: 81 ins; 1 del; 2 mod 8295948: Support for Zicbop/prefetch instructions on RISC-V Reviewed-by: fyang, yadongwang ------------- PR: https://git.openjdk.org/jdk/pull/10884 From luhenry at openjdk.org Thu Nov 10 14:09:08 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 10 Nov 2022 14:09:08 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <EE0hTK5oo7gVNlwRaD1raYEC9BpHHvCTov2uYiGlrSs=.d2eb3e4d-fbd2-4195-824f-c0e99f7895ec@github.com> On Thu, 10 Nov 2022 03:17:37 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11076 From aph at openjdk.org Thu Nov 10 14:09:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 10 Nov 2022 14:09:10 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <ljegzhYzEqbU-g5FxMbqEP4R0qXAoq3xwz27n15tSYk=.f18404f8-eee7-4533-9e6a-625009e46034@github.com> On Thu, 10 Nov 2022 03:17:37 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3984: > 3982: _masm->cbnzw(rscratch1, _label); > 3983: } > 3984: } Suggestion: if (value) { _masm->cbnzw(rscratch1, _label); } else { _masm->cbzw(rscratch1, _label); } } ------------- PR: https://git.openjdk.org/jdk/pull/11076 From stuefe at openjdk.org Thu Nov 10 14:23:33 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 10 Nov 2022 14:23:33 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> Message-ID: <VYBOovwmpFXj-clL2jMWF71wRj82yjtP7nr0tXoq9cA=.f9851ed2-966d-4034-9756-f850ad586f6b@github.com> On Tue, 8 Nov 2022 14:40:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. > > The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : > > > #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(0) : NativeCallStack::empty_stack()) > #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(1) : NativeCallStack::empty_stack()) > > > and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: > > > void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); > > > In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). > > However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): > > > 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # Load tracking level > cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> > cb9a7e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9a80: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> > # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: > cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> > cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 > cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 > cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) > cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): > cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx > ... > cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. > > --------------------- > > The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. > > This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. > > In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: > > > 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # load tracking level > cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> > cb990e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9910: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> > # no: nothing more to do ... > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > ... > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: > cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx > .. > cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. > > -------------- > > Results: > > When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. x86 test error in compiler/c2/TestVerifyGraphEdges.java unrelated ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stuefe at openjdk.org Thu Nov 10 14:44:10 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 10 Nov 2022 14:44:10 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <K7ecSrF5ZS8jqpkV61LEACfbIi2SGCHuRIgn3G6Rtpk=.f6120ab6-0e9d-4f84-b4e2-2c027fcc953e@github.com> On Mon, 7 Nov 2022 13:25:53 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA One thing that worries me a bit is that our test coverage is not that great. We have jtreg/runtime/Errorhandling, but that is not much. In particular we miss good tests for robustness (e.g. that alert us if more STEPs start failing, that reporting can cope with a partially corrupted JVM context, or an invalid Thread::current, or pre-init, or if we have very little stack or C-Heap left)... Unfortunately, tests like these tend to be annoyingly hard to get right. At SAP, we have more tests, but never really managed to get them completely clean. ------------- PR: https://git.openjdk.org/jdk/pull/11018 From redestad at openjdk.org Thu Nov 10 14:54:45 2022 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 10 Nov 2022 14:54:45 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v10] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <oVLMrnB2OBHgaRew9pnS0XLAJGan0EadYhtml0Xtclg=.533fb389-1ca6-44e7-b623-e72cb5cc0050@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Final touch-ups, restored 2-stride with dependency chain breakage - Minor cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/853a7575..af197062 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=08-09 Stats: 182 lines in 8 files changed: 43 ins; 74 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Thu Nov 10 14:54:48 2022 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 10 Nov 2022 14:54:48 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v9] In-Reply-To: <s7h5Q4AbJGEUg8HH2ffWEbGH7aj4OwIDZ-b7C3HTfe8=.d446582a-8379-4aaa-8938-98de3c5cbb01@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <gI7FMBYJotjnzPUDzLHCXROrrrxwnBRc1rJ5odyegk4=.bac9cc2a-f6a4-4541-9e28-956675052115@github.com> <s7h5Q4AbJGEUg8HH2ffWEbGH7aj4OwIDZ-b7C3HTfe8=.d446582a-8379-4aaa-8938-98de3c5cbb01@github.com> Message-ID: <sPZ-3lqnvg8QJ1bsPPQXuAY48txq866zUTCym57yfis=.ee21a055-6bab-482f-afae-81828d3447e5@github.com> On Wed, 9 Nov 2022 02:35:24 GMT, David Schlosnagle <duke at openjdk.org> wrote: >> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 55 commits: >> >> - Revert accidental ModuleHashes change >> - Merge branch 'master' into 8282664-polyhash >> - Merge pull request #2 from luhenry/dev/cl4es/8282664-polyhash >> >> Unroll + Reorder BBs >> - fixup! Handle size=0 and size=1 in Java >> - Handle size=0 and size=1 in Java >> - reorder BB to do single scalar first to avoid slowdown of short arrays, longer arrays jumps will be amortized by speedups >> - Unroll loop for cnt1 < 32 >> - Merge pull request #1 from luhenry/dev/cl4es/8282664-polyhash >> >> Switch to forward approach for vectorization >> - Fix vector loop >> - fix indexing >> - ... and 45 more: https://git.openjdk.org/jdk/compare/dd5d4df5...853a7575 > > src/hotspot/share/opto/matcher.cpp line 1707: > >> 1705: if (x >= _LAST_MACH_OPER) { >> 1706: fprintf(stderr, "x = %d, _LAST_MACH_OPER = %d\n", x, _LAST_MACH_OPER); >> 1707: fprintf(stderr, "dump n\n"); > > Should this be removed before merging? > Suggestion: Yes, fixed these in the latest version. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Thu Nov 10 14:57:53 2022 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 10 Nov 2022 14:57:53 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v11] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <2bXXJpyuGGH_dyzzfxu4cN3NFGmwjgjcCxz2mUONkc0=.81046071-5562-4e7e-bf2f-fbfd1076258c@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/af197062..2522625c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Thu Nov 10 15:03:26 2022 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 10 Nov 2022 15:03:26 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v12] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Qualified guess on shenandoahSupport fix-up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/2522625c..871f6cef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Thu Nov 10 15:07:14 2022 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 10 Nov 2022 15:07:14 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <tUUED_AhQ-59DfLrgFc1EijmNCuhz4uF-v15WZGwwQ8=.4e8e62f5-5f27-4a47-a8f0-ebf6adb41e20@github.com> Message-ID: <FrBD6MgaJN5nmkeck9q6LihoScDLfyzE__ay4oHpet0=.7dae187b-4100-4637-9c53-a637503c51f4@github.com> On Tue, 25 Oct 2022 16:03:28 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > I did a quick write up explaining the approach at https://gist.github.com/luhenry/2fc408be6f906ef79aaf4115525b9d0c. Also, you can find details in @richardstartin's [blog post](https://richardstartin.github.io/posts/vectorised-polynomial-hash-codes) I've restored the 2-stride dependency-chain breaking implementation that got lost in translation when me and @luhenry took turns on this. This helps keep things fast in the 1-31 size range, and allows for a decent speed-up on `byte[]` and `short[]` cases until we can figure out how to vectorize those properly. @luhenry baseline: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 0 avgt 5 0.786 ? 0.005 ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 5 1.068 ? 0.005 ns/op StringHashCode.Algorithm.defaultLatin1 2 avgt 5 2.513 ? 0.017 ns/op StringHashCode.Algorithm.defaultLatin1 31 avgt 5 22.837 ? 0.082 ns/op StringHashCode.Algorithm.defaultLatin1 32 avgt 5 16.622 ? 0.107 ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1193.884 ? 1.862 ns/op StringHashCode.Algorithm.defaultUTF16 0 avgt 5 0.786 ? 0.002 ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 5 1.884 ? 0.002 ns/op StringHashCode.Algorithm.defaultUTF16 2 avgt 5 2.512 ? 0.011 ns/op StringHashCode.Algorithm.defaultUTF16 31 avgt 5 23.061 ? 0.119 ns/op StringHashCode.Algorithm.defaultUTF16 32 avgt 5 16.429 ? 0.044 ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 5 1191.283 ? 4.600 ns/op Patch: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 0 avgt 5 0.787 ? 0.004 ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 5 1.050 ? 0.009 ns/op StringHashCode.Algorithm.defaultLatin1 2 avgt 5 2.198 ? 0.010 ns/op StringHashCode.Algorithm.defaultLatin1 31 avgt 5 18.413 ? 0.516 ns/op StringHashCode.Algorithm.defaultLatin1 32 avgt 5 16.599 ? 0.074 ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1189.958 ? 8.420 ns/op StringHashCode.Algorithm.defaultUTF16 0 avgt 5 0.785 ? 0.002 ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 5 1.885 ? 0.006 ns/op StringHashCode.Algorithm.defaultUTF16 2 avgt 5 2.219 ? 0.146 ns/op StringHashCode.Algorithm.defaultUTF16 31 avgt 5 19.052 ? 1.203 ns/op StringHashCode.Algorithm.defaultUTF16 32 avgt 5 16.558 ? 0.107 ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 5 1188.122 ? 9.394 ns/op The switches @luhenry added to help the 0 and 1 cases marginally help the by allowing the compilation to do early returns in these cases, avoiding jumping around as would be necessary in the inlined intrinsic. It allowed me to simplify the previous attempt at a 2-element stride routine, while ensuring the routine is correct even if we'd call it directly without the switch preamble. I think this is ready for a final review now. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From mcimadamore at openjdk.org Thu Nov 10 15:09:23 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 10 Nov 2022 15:09:23 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v2] In-Reply-To: <dMQ238rubrgCcyRv-456L2yibOvwrz1gWFPUOQTM5TQ=.31dcad26-a466-4806-ab72-33eb1c89a6ef@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <dMQ238rubrgCcyRv-456L2yibOvwrz1gWFPUOQTM5TQ=.31dcad26-a466-4806-ab72-33eb1c89a6ef@github.com> Message-ID: <i6cFFRmj8BY5ykF4ROioLuh1J0kFrKTRgyHf4fIFeSM=.aeeba4fe-3efa-4b3d-b31e-3f428b0c33f5@github.com> On Wed, 9 Nov 2022 18:16:59 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - Work around x86 failures > - Review comments Java changes look good - added few nits src/java.base/share/classes/java/lang/foreign/Linker.java line 319: > 317: * {@return A linker option used to save portions of the execution state immediately after > 318: * calling a foreign function associated with a downcall method handle, > 319: * before it can be overwritten by the runtime, or read through conventional means} Suggestion: * before it can be overwritten by the Java runtime, or read through conventional means} src/java.base/share/classes/java/lang/foreign/Linker.java line 340: > 338: * before it can be overwritten by the runtime, or read through conventional means. > 339: * <p> > 340: * State is captured by a downcall method handle on invocation, by writing it Suggestion: * Execution state is captured by a downcall method handle on invocation, by writing it src/java.base/share/classes/jdk/internal/foreign/abi/CallingSequence.java line 188: > 186: } > 187: > 188: public int capturedStateMask() { Isn't this a final static during execution? src/java.base/share/classes/jdk/internal/foreign/abi/NativeEntryPoint.java line 78: > 76: private static void checkType(MethodType methodType, boolean needsReturnBuffer, int savedValueMask) { > 77: if (methodType.parameterType(0) != long.class) { > 78: throw new IllegalArgumentException("Address expected as first param: " + methodType); Is throwing IAE correct here? E.g. can the user do anything about it, or does the exception describe more of an internal error? (In that case AssertionError might be better?) src/java.base/share/classes/jdk/internal/foreign/abi/NativeEntryPoint.java line 83: > 81: if ((needsReturnBuffer && methodType.parameterType(checkIdx++) != long.class) > 82: || (savedValueMask != 0 && methodType.parameterType(checkIdx) != long.class)) { > 83: throw new IllegalArgumentException("return buffer and/or preserved value address expected: " + methodType); Same here src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/CallArranger.java line 3: > 1: /* > 2: * Copyright (c) 2020, 2022, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2019, 2022, Arm Limited. All rights reserved. Not sure if all copyrights of all changed classes have been tweaked? This might be a more general problem with FFM API. test/jdk/ProblemList.txt line 484: > 482: # jdk_foreign > 483: > 484: java/foreign/callarranger/TestAarch64CallArranger.java generic-x86 Should we exclude these tests on 32 bits in the jtreg header (as I think we do for other tests) ? ------------- PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Thu Nov 10 15:28:20 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Nov 2022 15:28:20 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v3] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <MSdaJK_1PC8Zr8w70uKpB9W_Je4DX9TMXyyX-sZkWJE=.f9f8f6d0-4757-40d0-8d4e-1fb16f3f3382@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Javadoc nits Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/9e13922d..eb38b596 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Thu Nov 10 15:28:25 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Nov 2022 15:28:25 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v2] In-Reply-To: <i6cFFRmj8BY5ykF4ROioLuh1J0kFrKTRgyHf4fIFeSM=.aeeba4fe-3efa-4b3d-b31e-3f428b0c33f5@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <dMQ238rubrgCcyRv-456L2yibOvwrz1gWFPUOQTM5TQ=.31dcad26-a466-4806-ab72-33eb1c89a6ef@github.com> <i6cFFRmj8BY5ykF4ROioLuh1J0kFrKTRgyHf4fIFeSM=.aeeba4fe-3efa-4b3d-b31e-3f428b0c33f5@github.com> Message-ID: <adPNJWEXLNY6LPaqX2N0eeiYUQDyRlp4rfJBqZYWoNI=.5ee36196-7bb8-4e28-adde-3cdd09eab781@github.com> On Thu, 10 Nov 2022 14:59:20 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - Work around x86 failures >> - Review comments > > src/java.base/share/classes/jdk/internal/foreign/abi/CallingSequence.java line 188: > >> 186: } >> 187: >> 188: public int capturedStateMask() { > > Isn't this a final static during execution? It's fixed for a particular linkage request, but it can differ between them. (for instance, on Windows one downcall handle can save `errno` and another can save `GetLastError`) > src/java.base/share/classes/jdk/internal/foreign/abi/NativeEntryPoint.java line 78: > >> 76: private static void checkType(MethodType methodType, boolean needsReturnBuffer, int savedValueMask) { >> 77: if (methodType.parameterType(0) != long.class) { >> 78: throw new IllegalArgumentException("Address expected as first param: " + methodType); > > Is throwing IAE correct here? E.g. can the user do anything about it, or does the exception describe more of an internal error? (In that case AssertionError might be better?) Yes, it's an internal error. I can change the exception type > test/jdk/ProblemList.txt line 484: > >> 482: # jdk_foreign >> 483: >> 484: java/foreign/callarranger/TestAarch64CallArranger.java generic-x86 > > Should we exclude these tests on 32 bits in the jtreg header (as I think we do for other tests) ? I'm not sure what the conventional move here would be. Adding them to the problem list doesn't seem to make the failures go away in GHA at least. I can exclude them with `@requires` as well. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From stuefe at openjdk.org Thu Nov 10 16:00:05 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 10 Nov 2022 16:00:05 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming Message-ID: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. ------------- Commit messages: - JDK-8296796-factor-out-os-trim-native-heap Changes: https://git.openjdk.org/jdk/pull/11089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11089&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296796 Stats: 141 lines in 10 files changed: 89 ins; 34 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/11089.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11089/head:pull/11089 PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Thu Nov 10 16:00:06 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 10 Nov 2022 16:00:06 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <UWJmMzEnAQ750ueAGqVj3UwcuK-qkw4vGpNvFYKYlUo=.bb66e1c4-bae6-45f4-b29e-13d5adac052d@github.com> On Thu, 10 Nov 2022 13:23:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. riscv build error unrelated ------------- PR: https://git.openjdk.org/jdk/pull/11089 From jvernee at openjdk.org Thu Nov 10 16:48:19 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Nov 2022 16:48:19 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: - Tweak copyright headers - Use @requires to disable some tests on x86 - Use AssertionError for internal exceptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/eb38b596..7b1b95f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=02-03 Stats: 10 lines in 8 files changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Thu Nov 10 16:48:21 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Nov 2022 16:48:21 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v2] In-Reply-To: <i6cFFRmj8BY5ykF4ROioLuh1J0kFrKTRgyHf4fIFeSM=.aeeba4fe-3efa-4b3d-b31e-3f428b0c33f5@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <dMQ238rubrgCcyRv-456L2yibOvwrz1gWFPUOQTM5TQ=.31dcad26-a466-4806-ab72-33eb1c89a6ef@github.com> <i6cFFRmj8BY5ykF4ROioLuh1J0kFrKTRgyHf4fIFeSM=.aeeba4fe-3efa-4b3d-b31e-3f428b0c33f5@github.com> Message-ID: <TaAVYwu5HvtD2s4yBUJhMhs0iYz6PFV6wV6Ty6agMJc=.4dc06a94-6383-4836-af95-7d980de91eb6@github.com> On Thu, 10 Nov 2022 15:03:48 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - Work around x86 failures >> - Review comments > > src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/CallArranger.java line 3: > >> 1: /* >> 2: * Copyright (c) 2020, 2022, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2019, 2022, Arm Limited. All rights reserved. > > Not sure if all copyrights of all changed classes have been tweaked? This might be a more general problem with FFM API. I've updated them for all the changes in this PR ------------- PR: https://git.openjdk.org/jdk/pull/11019 From duke at openjdk.org Thu Nov 10 17:16:30 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 10 Nov 2022 17:16:30 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <t2zrTAXW_T-i8Q5MmM-n5Yoplvb7Yc4z12LmhtVUYEc=.52d99ea4-6d5e-4657-9fa8-e6dbca7415d0@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <t2zrTAXW_T-i8Q5MmM-n5Yoplvb7Yc4z12LmhtVUYEc=.52d99ea4-6d5e-4657-9fa8-e6dbca7415d0@github.com> Message-ID: <RB2QneNhRfvMKGBkhUvRNGbOY-ocSBTCshbNAavMVv0=.951c0462-00b0-4dac-81c3-06d912cd7c4e@github.com> On Wed, 9 Nov 2022 12:02:17 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: >> This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions. >> >> In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap. >> When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific. >> When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors. >> >> This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap. >> This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon. > > Some additional initial notes about the code: > > * please document the new APIs in `CollectedHeap` (something like "// Support for mapping archive regions into the heap" for 6 methods with tons of parameters is not sufficient) - it would also help reviewing. > * also we should be careful with the naming as the term "region" is overloaded enough already; e.g. `heap_region_dealloc_supported` seems to miss an `archive`. > * while we are at changing the API, the change should use appropriate types, i.e. unsigned ints/size_t for sizes. > * there is quite a bit inconsistency in the naming: sometimes the MemRegions are called "something_regions", sometimes "something_ranges", there is also "something_spaces"; the same with the associated counts that are sometimes called "count", and sometimes "num_regions". > > Summarizing all this, I would strongly suggest to use the term "ranges" instead of regions for the areas/MemRegions from the archive. This would help a lot with code clarity in collectors that already use the term regions. > > Similar to @iklam I strongly suggest splitting this change by collector (or at least basic+g1 and epsilon/serial/parallel) anyway. There will likely be considerable changes on top of your current stack of changes which may require working/reviewing on the tip. > > There are additional initial comments in the PR. I understand that some of them may become obsolete as we change the API. > > I did not really look at dumptime considerations which seem to be under discussion right now; I do not have the overview about the exact requirements/problems in that area, particularly with regards to JDK-8296344. > The chosen API (and the intended execution flow during dump/runtime) is underdocumented at the moment to give good comments. It would be nice to summarize that to start a discussion; maybe even discuss this on some mailing list instead in the PR of a 2k LOC change. > > However, from the existing commetns here, I do think it is a good idea to write out all chosen assumptions when dumping (e.g. the alignment boundary of 1MB), even if currently that is the minimum for all collectors. @tschatzl thanks for reviewing the PR. Given that JDK-8296344 is likely to affect this PR and there are still details to be figured out and finalized for the dumping, I guess its best to put this on hold for now. Once we have the details sorted out, I can resurrect this patch if required. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From tschatzl at openjdk.org Thu Nov 10 19:37:32 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Nov 2022 19:37:32 GMT Subject: RFR: 8295871: G1: Use different explicit claim marks for CLDs [v3] In-Reply-To: <jzNgSnfX-G5q4lbbE4ykGzWZsfZf9vSJfdh9kP1Qjpo=.8c140522-b6c7-4b1e-b77f-22467e5a3bd8@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> <jzNgSnfX-G5q4lbbE4ykGzWZsfZf9vSJfdh9kP1Qjpo=.8c140522-b6c7-4b1e-b77f-22467e5a3bd8@github.com> Message-ID: <dqaMxXkD-VEadJaKzESC2Y3gxDhqLgMfPuRR5imMvvs=.c7cf9574-3f1c-471a-beb2-dbfabc92fc80@github.com> On Wed, 9 Nov 2022 13:17:03 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review > > If the same technique can be used by Serial/Parallel (I believe so), I'd prefer sth more generic, `_claim_stw_fullgc_mark/adjust`. > > (I am surprised that `_claim_finalizable` is used only by ZGC -- this essentially mirrors finalizable/strong marking, needed for conc ref-processing.) Thanks @albertnetymk @kstefanj for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/10989 From tschatzl at openjdk.org Thu Nov 10 19:39:05 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Nov 2022 19:39:05 GMT Subject: Integrated: 8295871: G1: Use different explicit claim marks for CLDs In-Reply-To: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> References: <I3EtBdU9uYpn9yRpuAW8MASeOFOh84YYXOji3_5CJRc=.0498108a-547f-4ebd-bc92-4653f805417c@github.com> Message-ID: <eA6EpF3at8-js_o2izNX9Qngt_gahZ4mq7hBdhhgX2g=.07208ccc-1d89-4b5c-a590-f3de370a09d6@github.com> On Fri, 4 Nov 2022 14:46:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: > Hi all, > > can I have reviews for this follow-up to [JDK-8295118](https://bugs.openjdk.org/browse/JDK-8295118) that removes the need to clear CLD claim marks for every full gc phase by using different claim values for the different phases. > > Some comments: > * I used new g1 specific claim values instead of overloading the existing ones, which is imho clearer. I am open to better names, but something like `_claim_strong_2/3` seemed too cryptic. Then again, there is now a collector specific name in the enum. Maybe the enum values should be made collector-specific in some way? Currently they already are (e.g. `_claim_finalizable` is only used in ZGC) as G1 does not need the values except for (multiple) `_claim_strong`. > * I moved the CLD mark verification for the mark phase from `prepare_collection` to the constructor of `G1FullGCMarker`; I think this place is more fitting as directly above there is the use in the `CLDToOopClosure`. Also this pattern aligns with the use in the `G1FullGCAdjustTask`. > > Testing: tier1-5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: e1badb77 Author: Thomas Schatzl <tschatzl at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e1badb77fb50ba30c8a22d43a641426ff774607b Stats: 27 lines in 7 files changed: 10 ins; 6 del; 11 mod 8295871: G1: Use different explicit claim marks for CLDs Reviewed-by: sjohanss, ayang ------------- PR: https://git.openjdk.org/jdk/pull/10989 From jnimeh at openjdk.org Thu Nov 10 20:11:46 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 10 Nov 2022 20:11:46 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7702/files - new: https://git.openjdk.org/jdk/pull/7702/files/53b432e5..8d4b7ba7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=01-02 Stats: 66 lines in 2 files changed: 42 ins; 5 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Thu Nov 10 20:15:14 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 10 Nov 2022 20:15:14 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> Message-ID: <5clqVh13SSlaNtPkBJ67ufP2DF9vOWyPdWml63Zwbxs=.d0e836cf-15c7-4e3e-aace-2c2aabc2d345@github.com> On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations using vpshufb (not vpshufd as I typo'ed on my commit message) on AVX/AVX2 for 8-bit and 16-bit left rotations has given us some modest speed gains: Before (with intrinsics): AVX=1 ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s AVX=2 ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s After (using vpshufb): AVX=1 Benchmark (dataSize) Mode Cnt Score Error Units ChaCha20.encrypt 256 thrpt 40 1447416.349 ? 14054.478 ops/s ChaCha20.encrypt 1024 thrpt 40 495844.721 ? 1949.237 ops/s ChaCha20.encrypt 4096 thrpt 40 138154.478 ? 411.707 ops/s ChaCha20.encrypt 16384 thrpt 40 35165.143 ? 110.483 ops/s AVX=2 ChaCha20.encrypt 256 thrpt 40 2020170.211 ? 10507.466 ops/s ChaCha20.encrypt 1024 thrpt 40 829644.325 ? 6452.931 ops/s ChaCha20.encrypt 4096 thrpt 40 246066.542 ? 1052.905 ops/s ChaCha20.encrypt 16384 thrpt 40 64021.363 ? 468.979 ops/s This was done on the same system that the original benchmarks were done on. None of these changes affect AVX512. I'm working on a hybrid intrinsic approach to get the best of both worlds for those smaller single-part jobs. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From sviswanathan at openjdk.org Thu Nov 10 20:27:37 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 10 Nov 2022 20:27:37 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <5clqVh13SSlaNtPkBJ67ufP2DF9vOWyPdWml63Zwbxs=.d0e836cf-15c7-4e3e-aace-2c2aabc2d345@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> <5clqVh13SSlaNtPkBJ67ufP2DF9vOWyPdWml63Zwbxs=.d0e836cf-15c7-4e3e-aace-2c2aabc2d345@github.com> Message-ID: <rSxsGLEA_OBsxEgEODIXOkRXAsRX6NqxBeg0FVDNUXs=.6bfa5642-750d-46f2-b74d-3e224690632b@github.com> On Thu, 10 Nov 2022 20:12:30 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: >> >> replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations > > using vpshufb (not vpshufd as I typo'ed on my commit message) on AVX/AVX2 for 8-bit and 16-bit left rotations has given us some modest speed gains: > Before (with intrinsics): > > AVX=1 > ChaCha20.encrypt 256 thrpt 40 1338667.215 ? 12012.240 ops/s > ChaCha20.encrypt 1024 thrpt 40 453682.363 ? 2559.322 ops/s > ChaCha20.encrypt 4096 thrpt 40 124785.645 ? 394.535 ops/s > ChaCha20.encrypt 16384 thrpt 40 31788.969 ? 90.770 ops/s > > AVX=2 > ChaCha20.encrypt 256 thrpt 40 1893810.127 ? 21870.718 ops/s > ChaCha20.encrypt 1024 thrpt 40 758024.511 ? 5414.552 ops/s > ChaCha20.encrypt 4096 thrpt 40 224032.805 ? 935.309 ops/s > ChaCha20.encrypt 16384 thrpt 40 58112.296 ? 498.048 ops/s > > After (using vpshufb): > > AVX=1 > Benchmark (dataSize) Mode Cnt Score Error Units > ChaCha20.encrypt 256 thrpt 40 1447416.349 ? 14054.478 ops/s > ChaCha20.encrypt 1024 thrpt 40 495844.721 ? 1949.237 ops/s > ChaCha20.encrypt 4096 thrpt 40 138154.478 ? 411.707 ops/s > ChaCha20.encrypt 16384 thrpt 40 35165.143 ? 110.483 ops/s > > AVX=2 > ChaCha20.encrypt 256 thrpt 40 2020170.211 ? 10507.466 ops/s > ChaCha20.encrypt 1024 thrpt 40 829644.325 ? 6452.931 ops/s > ChaCha20.encrypt 4096 thrpt 40 246066.542 ? 1052.905 ops/s > ChaCha20.encrypt 16384 thrpt 40 64021.363 ? 468.979 ops/s > > This was done on the same system that the original benchmarks were done on. None of these changes affect AVX512. > > I'm working on a hybrid intrinsic approach to get the best of both worlds for those smaller single-part jobs. @jnimeh Very nice work overall. I think it would be ok to get this PR integrated and do the hybrid approach as a follow on PR. Your work in general shows very good improvement over base. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From iklam at openjdk.org Thu Nov 10 20:45:25 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 10 Nov 2022 20:45:25 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> Message-ID: <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> On Tue, 8 Nov 2022 14:40:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. > > The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : > > > #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(0) : NativeCallStack::empty_stack()) > #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(1) : NativeCallStack::empty_stack()) > > > and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: > > > void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); > > > In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). > > However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): > > > 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # Load tracking level > cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> > cb9a7e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9a80: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> > # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: > cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> > cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 > cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 > cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) > cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): > cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx > ... > cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. > > --------------------- > > The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. > > This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. > > In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: > > > 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # load tracking level > cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> > cb990e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9910: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> > # no: nothing more to do ... > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > ... > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: > cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx > .. > cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. > > -------------- > > Results: > > When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. (I commented before but for some reason it's been lost). There's one use of the default constructor that you've missed (I found that by removing the body of the NativeCallStack() constructor): ReservedMemoryRegion(const ReservedMemoryRegion& rr) : VirtualMemoryRegion(rr.base(), rr.size()) { *this = rr; } I think it will be much safer to leave the existing default constructor, and have something like: private: NativeCallStack(int dummy) { _dummy[0] = NULL; } public: inline static NativeCallStack fake_stack() { NativeCallStack fake(0); return fake; } This will keep the behavior the same as before. Note that your patch will change the behavior if the fake stack is actually used. E.g., for this function: inline bool is_empty() const { return _stack[0] == NULL; } If you are absolutely sure that the fake stacks are never used, and really want to get rid of the `_stack[0] = NULL`, I would suggest adding a new debug-only field, and add asserts like this in all public functions: inline bool is_empty() const { assert(!is_fake, "sanity"); return _stack[0] == NULL; } ------------- PR: https://git.openjdk.org/jdk/pull/11040 From sviswanathan at openjdk.org Thu Nov 10 22:14:36 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 10 Nov 2022 22:14:36 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11] In-Reply-To: <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> Message-ID: <Jfk4uciCLP7dRECbUamzrwiLx1QFOjVUA5zXdSL8YCA=.2304f245-f6c2-4337-990e-b05749628b0f@github.com> On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix windows and 32b linux builds src/hotspot/share/opto/library_call.cpp line 6981: > 6979: > 6980: if (!stubAddr) return false; > 6981: Node* polyObj = argument(0); Minor cleanup: This could be removed as it is not used. src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 28: > 26: package com.sun.crypto.provider; > 27: > 28: import java.lang.reflect.Field; Minor cleanup: This could be removed. src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 249: > 247: @ForceInline > 248: @IntrinsicCandidate > 249: private void processMultipleBlocks(byte[] input, int offset, int length, long[] aLimbs, long[] rLimbs) { A comment here to indicate aLimbs and rLimbs are part of a and r and used in intrinsic. src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 253: > 251: n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); > 252: a.setSum(n); // A += (temp | 0x01) > 253: a.setProduct(r); // A = (A * R) % p Comment needs update to match code. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 10 22:48:37 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 10 Nov 2022 22:48:37 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v12] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <URDJbHxq9xXluD7xtzFW3JoR3vFeEqFhIlcgCOV7ymQ=.e006e025-1c1e-4e24-b4c2-1ce834b1525b@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Sandhya's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/abfc68f4..2176caf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=10-11 Stats: 6 lines in 2 files changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 10 22:48:38 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 10 Nov 2022 22:48:38 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11] In-Reply-To: <Jfk4uciCLP7dRECbUamzrwiLx1QFOjVUA5zXdSL8YCA=.2304f245-f6c2-4337-990e-b05749628b0f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> <Jfk4uciCLP7dRECbUamzrwiLx1QFOjVUA5zXdSL8YCA=.2304f245-f6c2-4337-990e-b05749628b0f@github.com> Message-ID: <jvw805Op3RYdMVfMo_5yaK6RIEXNaLrKwPpSbB5leOI=.00976a93-ff6b-4d51-ac49-33b253fcd47a@github.com> On Thu, 10 Nov 2022 22:03:24 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix windows and 32b linux builds > > src/hotspot/share/opto/library_call.cpp line 6981: > >> 6979: >> 6980: if (!stubAddr) return false; >> 6981: Node* polyObj = argument(0); > > Minor cleanup: This could be removed as it is not used. done > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 28: > >> 26: package com.sun.crypto.provider; >> 27: >> 28: import java.lang.reflect.Field; > > Minor cleanup: This could be removed. done > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 249: > >> 247: @ForceInline >> 248: @IntrinsicCandidate >> 249: private void processMultipleBlocks(byte[] input, int offset, int length, long[] aLimbs, long[] rLimbs) { > > A comment here to indicate aLimbs and rLimbs are part of a and r and used in intrinsic. done > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 253: > >> 251: n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); >> 252: a.setSum(n); // A += (temp | 0x01) >> 253: a.setProduct(r); // A = (A * R) % p > > Comment needs update to match code. done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 10 22:59:52 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 10 Nov 2022 22:59:52 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <AdNrtqNSMPZSK_Tfbr4q-YNlelgx8AEVeDtn-4EoV6Y=.15cead8f-06fd-40bd-9e39-fbb7ecc2cbd6@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: jcheck ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/2176caf8..196ee35b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From amenkov at openjdk.org Fri Nov 11 00:51:07 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 11 Nov 2022 00:51:07 GMT Subject: RFR: 8296265: Use modern HTML in the JVMTI spec Message-ID: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Changes: - removed `<b>` from TOC; - added CSS style for TOC (to simplify customization, currently it's empty); - removed `<b>` from from function list (per Phase); - removed `<b>` from from list of events; - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); - dropped duplicate `"b"` transform. ------------- Commit messages: - Fixed jvmti.xsl Changes: https://git.openjdk.org/jdk/pull/11099/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11099&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296265 Stats: 53 lines in 1 file changed: 2 ins; 16 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/11099.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11099/head:pull/11099 PR: https://git.openjdk.org/jdk/pull/11099 From psandoz at openjdk.org Fri Nov 11 00:51:34 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 11 Nov 2022 00:51:34 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v17] In-Reply-To: <Y6p315Q6Xv7ZWYDz0yMkyJxjYVxz6Iomu5phcti2KJE=.5f20042a-4d70-4dad-88ba-2883e1f8ab5f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <Y6p315Q6Xv7ZWYDz0yMkyJxjYVxz6Iomu5phcti2KJE=.5f20042a-4d70-4dad-88ba-2883e1f8ab5f@github.com> Message-ID: <g0K-MydrQsjl78yEthfG8UUNB87UYhiALFPKH55b2dQ=.5a700d31-e730-40b6-9986-6214082d0e70@github.com> On Wed, 9 Nov 2022 13:24:54 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Tweak Arena::close javadoc src/java.base/share/classes/java/lang/foreign/Arena.java line 101: > 99: * @throws IllegalArgumentException if {@code bytesSize < 0}, {@code alignmentBytes <= 0}, or if {@code alignmentBytes} > 100: * is not a power of 2. > 101: * @throws IllegalStateException if the session associated with this arena is not {@linkplain MemorySession#isAlive() alive}. Suggestion: * @throws IllegalStateException if arena's session is not {@linkplain MemorySession#isAlive() alive}. src/java.base/share/classes/java/lang/foreign/Arena.java line 121: > 119: * segments associated with that memory session are also released. > 120: * @throws IllegalStateException if the session associated with this arena is not {@linkplain MemorySession#isAlive() alive}. > 121: * @throws IllegalStateException if this session is {@linkplain MemorySession#whileAlive(Runnable) kept alive} by another client. Suggestion: * @throws IllegalStateException if the arena's session is not {@linkplain MemorySession#isAlive() alive}. * @throws IllegalStateException if the arena's session is {@linkplain MemorySession#whileAlive(Runnable) kept alive}. Note i removed "by another client". I wanted to say "by another thread", but then there is the case of calling close from within the Runnable passed to whileAlive, so i wanted to say "by another caller". But, i think this can all be implied and we don't need to say anything. src/java.base/share/classes/java/lang/foreign/MemorySession.java line 66: > 64: * is not critical, or in unstructured cases where the boundaries of the lifetime associated with a memory session > 65: * cannot be easily determined. As shown in the example above, a memory session that is managed implicitly cannot end > 66: * if a program references to one or more segments associated with that session. This means that memory segments associated Suggestion: * if a program references one or more segments associated with that session. This means that memory segments associated src/java.base/share/classes/java/lang/foreign/MemorySession.java line 89: > 87: > 88: /** > 89: * {@return {@code true} if the provided thread can access and/or obtain segments associated with this memory session} Is the following accurate and more concise? Suggestion: * {@return {@code true} if the provided thread can access and/or associate segments with this memory session} ------------- PR: https://git.openjdk.org/jdk/pull/10872 From duke at openjdk.org Fri Nov 11 01:14:05 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 01:14:05 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: live review with Sandhya ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/196ee35b..835fbe3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=12-13 Stats: 32 lines in 3 files changed: 17 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From sviswanathan at openjdk.org Fri Nov 11 01:15:50 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Nov 2022 01:15:50 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> Message-ID: <u8IaTPa5e5J-O6dtaDV9-zOuUUHPoQ3AchQ6qLSS2dM=.a38d7196-3013-4b39-9041-fed05a7eaf53@github.com> On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > live review with Sandhya Marked as reviewed by sviswanathan (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sviswanathan at openjdk.org Fri Nov 11 01:21:42 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Nov 2022 01:21:42 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> Message-ID: <yhyVyecBWACPQcMw06SAb-j6RE57JMuoaphKaXnsWOY=.6ee2397b-c367-403b-af10-4da77da67171@github.com> On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > live review with Sandhya The PR looks good to me. @ascarpino Please let us know if the Java side changes look good to you. @iwanowww Please let us know if the compiler side changes look good to you. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 11 01:47:41 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 01:47:41 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> Message-ID: <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > live review with Sandhya Overall, it looks good. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 733: > 731: void andptr(Register src1, Register src2) { LP64_ONLY(andq(src1, src2)) NOT_LP64(andl(src1, src2)) ; } > 732: > 733: #ifdef _LP64 Why is it x64-specific? src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 161: > 159: const XMMRegister P2_H = xmm5; > 160: const XMMRegister TMP1 = xmm6; > 161: const Register polyCP = r13; Could be renamed to `rscratch` (or `tmp`) since it doesn't hold constant base address anymore. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 11 01:47:41 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 01:47:41 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11] In-Reply-To: <jvw805Op3RYdMVfMo_5yaK6RIEXNaLrKwPpSbB5leOI=.00976a93-ff6b-4d51-ac49-33b253fcd47a@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <actl_ZA8F_vYPeNjov0QcGklPyBQjm-geVl1rPLtFpU=.c440119c-5b4e-4c98-9a57-4fa36785bcbf@github.com> <Jfk4uciCLP7dRECbUamzrwiLx1QFOjVUA5zXdSL8YCA=.2304f245-f6c2-4337-990e-b05749628b0f@github.com> <jvw805Op3RYdMVfMo_5yaK6RIEXNaLrKwPpSbB5leOI=.00976a93-ff6b-4d51-ac49-33b253fcd47a@github.com> Message-ID: <CiSy3vj4B8bpA2sKwl4_wy3aLJimuejwiOR0VyHmTzU=.0141180c-780c-49fd-8ce5-b5879a629242@github.com> On Thu, 10 Nov 2022 22:41:31 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 249: >> >>> 247: @ForceInline >>> 248: @IntrinsicCandidate >>> 249: private void processMultipleBlocks(byte[] input, int offset, int length, long[] aLimbs, long[] rLimbs) { >> >> A comment here to indicate aLimbs and rLimbs are part of a and r and used in intrinsic. > > done Overall, it looks weird to see aLimbs/rLimbs being unused, but I see why it is so. If security folks are fine with that, I'm OK with it as well. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 11 01:47:42 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 01:47:42 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13] In-Reply-To: <AdNrtqNSMPZSK_Tfbr4q-YNlelgx8AEVeDtn-4EoV6Y=.15cead8f-06fd-40bd-9e39-fbb7ecc2cbd6@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <AdNrtqNSMPZSK_Tfbr4q-YNlelgx8AEVeDtn-4EoV6Y=.15cead8f-06fd-40bd-9e39-fbb7ecc2cbd6@github.com> Message-ID: <S8GnNjm6Y0ggYmpkr3b0rCYNnOIm0zyZoUqwGA-MAbk=.4fffdc3e-e5f3-498c-84c0-3c3719cc9e14@github.com> On Thu, 10 Nov 2022 22:59:52 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > jcheck src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 252: > 250: private void processMultipleBlocks(byte[] input, int offset, int length, long[] aLimbs, long[] rLimbs) { > 251: while (length >= BLOCK_LENGTH) { > 252: n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); You could call `processBlock(input, offset, BLOCK_LENGTH);` here. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 11 02:24:21 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 02:24:21 GMT Subject: RFR: 8294033: x86_64: libm stubs are missing Message-ID: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> There's a regression from [JDK-8293285](https://bugs.openjdk.org/browse/JDK-8293285) refactoring where I forgot to call generate_libm_stubs() during stub initialization phase. The patch restores proper stub init sequence and also piles some minor refactorings on top. Testing: hs-tier1 - hs-tier2 Thanks! ------------- Commit messages: - 8294033: x86_64: libm stubs are missing Changes: https://git.openjdk.org/jdk/pull/11100/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11100&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294033 Stats: 25 lines in 2 files changed: 6 ins; 9 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/11100.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11100/head:pull/11100 PR: https://git.openjdk.org/jdk/pull/11100 From jvernee at openjdk.org Fri Nov 11 02:53:03 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 11 Nov 2022 02:53:03 GMT Subject: RFR: 8294033: x86_64: libm stubs are missing In-Reply-To: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> References: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> Message-ID: <Fcyg9epX-IN3A0dHwlV2VBChnTWjiISLjqThe6Pr8kk=.fd1c12da-8bcb-458f-8579-f871897251a1@github.com> On Fri, 11 Nov 2022 02:07:22 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > There's a regression from [JDK-8293285](https://bugs.openjdk.org/browse/JDK-8293285) refactoring where I forgot to call generate_libm_stubs() during stub initialization phase. > > The patch restores proper stub init sequence and also piles some minor refactorings on top. > > Testing: hs-tier1 - hs-tier2 > > Thanks! Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11100 From fyang at openjdk.org Fri Nov 11 02:59:30 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 11 Nov 2022 02:59:30 GMT Subject: RFR: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: <XnhDHudl-5P-rb0pGwK5nGiZmgmu3mK7HcRl5XUGz3Q=.05d5e32c-6888-4ede-8c62-b01cb1efe398@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> <XnhDHudl-5P-rb0pGwK5nGiZmgmu3mK7HcRl5XUGz3Q=.05d5e32c-6888-4ede-8c62-b01cb1efe398@github.com> Message-ID: <wCWq6yE5G40RuLkkKJeL_ULNzPc3NkFlMwivyRpPtLY=.74fe5985-4722-42b8-878f-c82490b55c80@github.com> On Thu, 10 Nov 2022 12:36:36 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: >> In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. >> >> The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. >> >> $ java -XX:+PrintBytecodeHistogram --version|head -n 20 >> openjdk 20 2022-11-09 >> OpenJDK Runtime Environment (fastdebug build 20) >> OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) >> >> Histogram of 8101142 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 634592 7.83% dc fast_aload_0 >> 471840 5.82% b6 invokevirtual >> 376275 4.64% 2b aload_1 >> 358520 4.43% e0 fast_iload >> 332267 4.10% de fast_aaccess_0 >> 270189 3.34% a7 goto >> 249831 3.08% 19 aload >> 223361 2.76% b9 invokeinterface >> 215666 2.66% 1c iload_2 >> 194877 2.41% b8 invokestatic >> 192212 2.37% 2c aload_2 >> 185826 2.29% 1b iload_1 >> >> $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 >> openjdk 20 2022-11-09 >> OpenJDK Runtime Environment (fastdebug build 20) >> OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) >> >> Histogram of 7627721 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 102673 1.346% 84 a7 iinc goto >> 85429 1.120% dc 2b fast_aload_0 aload_1 >> 84394 1.106% dc b6 fast_aload_0 invokevirtual >> 73131 0.959% b7 dc invokespecial fast_aload_0 >> 64605 0.847% 2b b6 aload_1 invokevirtual >> 64086 0.840% dc b9 fast_aload_0 invokeinterface >> 63663 0.835% b6 dc invokevirtual fast_aload_0 >> 59946 0.786% b6 de invokevirtual fast_aaccess_0 >> 56631 0.742% 36 e0 istore fast_iload >> 51261 0.672% b9 de invokeinterface fast_aaccess_0 >> 49556 0.650% 3a 19 astore aload >> 49106 0.644% a7 e0 goto fast_iload >> >> The items in column "relative" are equal to percentages of bytecodes in the result of the TraceBytecodes option(counting bytecodes manually). > > Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: > > replace atomic_addalw with atomic_addw Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11051 From kvn at openjdk.org Fri Nov 11 05:24:39 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Nov 2022 05:24:39 GMT Subject: RFR: 8294033: x86_64: libm stubs are missing In-Reply-To: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> References: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> Message-ID: <GAKXxh134NjCjlA7ln0Vkw_JRdJnedzen3sySzWBF_I=.d978bdbb-301b-4dfa-94f7-a7aa92e64c7e@github.com> On Fri, 11 Nov 2022 02:07:22 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > There's a regression from [JDK-8293285](https://bugs.openjdk.org/browse/JDK-8293285) refactoring where I forgot to call generate_libm_stubs() during stub initialization phase. > > The patch restores proper stub init sequence and also piles some minor refactorings on top. > > Testing: hs-tier1 - hs-tier2 > > Thanks! Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11100 From stuefe at openjdk.org Fri Nov 11 06:09:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 11 Nov 2022 06:09:28 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> Message-ID: <XwCMRwip_lxSPzKYlAGZ4PB6jfEnbeCjhL4DcHSpaUQ=.dbdc0564-bcf6-47fa-b10a-cba948527537@github.com> On Thu, 10 Nov 2022 20:41:49 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > (I commented before but for some reason it's been lost). > > There's one use of the default constructor that you've missed (I found that by removing the body of the NativeCallStack() constructor): > > > ReservedMemoryRegion(const ReservedMemoryRegion& rr) : > VirtualMemoryRegion(rr.base(), rr.size()) { > *this = rr; > } > > > I think it will be much safer to leave the existing default constructor, and have something like: > > > private: > NativeCallStack(int dummy) { > _dummy[0] = NULL; > } > > public: > inline static NativeCallStack fake_stack() { > NativeCallStack fake(0); > return fake; > } > > > This will keep the behavior the same as before. Note that your patch will change the behavior if the fake stack is actually used. E.g., for this function: > > > inline bool is_empty() const { > return _stack[0] == NULL; > } > > > If you are absolutely sure that the fake stacks are never used, and really want to get rid of the `_stack[0] = NULL`, I would suggest adding a new debug-only field, and add asserts like this in all public functions: > > > inline bool is_empty() const { > assert(!is_fake, "sanity"); > return _stack[0] == NULL; > } Hi @iklam, > (I commented before but for some reason it's been lost). > This seems to happen more recently. > There's one use of the default constructor that you've missed (I found that by removing the body of the NativeCallStack() constructor): > > ``` > ReservedMemoryRegion(const ReservedMemoryRegion& rr) : > VirtualMemoryRegion(rr.base(), rr.size()) { > *this = rr; > } > ``` > > I think it will be much safer to leave the existing default constructor, and have something like: > > ``` > private: > NativeCallStack(int dummy) { > _dummy[0] = NULL; > } > > public: > inline static NativeCallStack fake_stack() { > NativeCallStack fake(0); > return fake; > } > ``` > I thought about this but did not want to pay for the argument and the using of it. But maybe it would be optimized away. I'll experiment a bit more. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stefank at openjdk.org Fri Nov 11 06:47:34 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 06:47:34 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps Message-ID: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc ------------- Depends on: https://git.openjdk.org/jdk/pull/11084 Commit messages: - 8296785: Use realloc for CHeap-allocated BitMaps Changes: https://git.openjdk.org/jdk/pull/11102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296785 Stats: 227 lines in 5 files changed: 174 ins; 31 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/11102.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11102/head:pull/11102 PR: https://git.openjdk.org/jdk/pull/11102 From xliu at openjdk.org Fri Nov 11 07:39:30 2022 From: xliu at openjdk.org (Xin Liu) Date: Fri, 11 Nov 2022 07:39:30 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v2] In-Reply-To: <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> Message-ID: <edpXZnM2epZ7m30I3z-r-DVa2qCkdUAcf2bCRbCFMS4=.1563084f-76c4-43b0-b1b7-773ff3672102@github.com> On Thu, 10 Nov 2022 10:15:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray It looks like no one is using mtNone for GrowableArray, but memReporter.cpp and virtualMemoryTracker still accept it. Shall we eliminate mtNone completely? src/hotspot/share/utilities/growableArray.hpp line 734: > 732: } > 733: > 734: GrowableArray(MEMFLAGS memflags, int initial_capacity = 2) : I feel it's too dangerous to have two constructors with convertible parameters. eg. GrowableArray(int initial_capacity) and GrowableArray(MEMFLAGS memflags, int initial_capacity = 2) are very close. what if someone has code like this: short t = 2; GrowableArray(2); // it's very vague to me which ctor will be selected. I think it's a good idea to use 'explicit constructor' for one of them. in case implicit conversation takes place unconsciously. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Fri Nov 11 08:15:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 08:15:30 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v2] In-Reply-To: <edpXZnM2epZ7m30I3z-r-DVa2qCkdUAcf2bCRbCFMS4=.1563084f-76c4-43b0-b1b7-773ff3672102@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> <edpXZnM2epZ7m30I3z-r-DVa2qCkdUAcf2bCRbCFMS4=.1563084f-76c4-43b0-b1b7-773ff3672102@github.com> Message-ID: <X1_k3JyJDZeRZRfkybRomwV69_ZaHx9CB6l4GfBLiyo=.5ce268a4-0068-456d-bb55-8a3a47a9d628@github.com> On Fri, 11 Nov 2022 07:33:46 GMT, Xin Liu <xliu at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray > > src/hotspot/share/utilities/growableArray.hpp line 734: > >> 732: } >> 733: >> 734: GrowableArray(MEMFLAGS memflags, int initial_capacity = 2) : > > I feel it's too dangerous to have two constructors with convertible parameters. > eg. GrowableArray(int initial_capacity) and GrowableArray(MEMFLAGS memflags, int initial_capacity = 2) are very close. what if someone has code like this: > > short t = 2; > GrowableArray(2); // it's very vague to me which ctor will be selected. > > > I think it's a good idea to use 'explicit constructor' for one of them. in case implicit conversation takes place unconsciously. MEMFLAGS is defined as a class enum, so integers will not be automatically converted to MEMFLAGS. I'll add explicit to the constructors. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From yadongwang at openjdk.org Fri Nov 11 08:34:10 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 11 Nov 2022 08:34:10 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V [v2] In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <xg2p4-Migan3pQupzs9MpDHI3Blb6OZqNC97UM-SfRw=.3d950c93-b9b8-4e09-aa49-b08fd3a31194@github.com> > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java Yadong Wang has updated the pull request incrementally with one additional commit since the last revision: reverse branch order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11076/files - new: https://git.openjdk.org/jdk/pull/11076/files/f927182d..8efa3014 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11076&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11076&range=00-01 Stats: 10 lines in 2 files changed: 4 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11076.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11076/head:pull/11076 PR: https://git.openjdk.org/jdk/pull/11076 From stefank at openjdk.org Fri Nov 11 08:47:39 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 08:47:39 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Mark constructors explicit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/9e398353..7b43d04a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From aph at openjdk.org Fri Nov 11 08:50:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 11 Nov 2022 08:50:41 GMT Subject: RFR: 8296630: Fix SkipIfEqual on AArch64 and RISC-V [v2] In-Reply-To: <xg2p4-Migan3pQupzs9MpDHI3Blb6OZqNC97UM-SfRw=.3d950c93-b9b8-4e09-aa49-b08fd3a31194@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> <xg2p4-Migan3pQupzs9MpDHI3Blb6OZqNC97UM-SfRw=.3d950c93-b9b8-4e09-aa49-b08fd3a31194@github.com> Message-ID: <EqzxPb1adkwlquIyxVPMwVCvaLnXLNebsSblYTrIbVw=.8dcde5d5-2e1f-4aec-aa3b-a48beb1cb843@github.com> On Fri, 11 Nov 2022 08:34:10 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: >> SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. >> >> AArch64 tier1, riscv hotspot & jdk tier1 have been tested. >> Additional cases with dtrace tested on AArch64: >> test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java >> test/hotspot/jtreg/compiler/runtime/Test8168712.java > > Yadong Wang has updated the pull request incrementally with one additional commit since the last revision: > > reverse branch order Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11076 From stefank at openjdk.org Fri Nov 11 11:46:32 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 11:46:32 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v2] In-Reply-To: <edpXZnM2epZ7m30I3z-r-DVa2qCkdUAcf2bCRbCFMS4=.1563084f-76c4-43b0-b1b7-773ff3672102@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <LEoBvJqATCkwqXFQAxjjgrsKfDdOQckrsSkTNhAEGYE=.0948e37a-3480-48d1-a2b1-e1e41b54e2e1@github.com> <edpXZnM2epZ7m30I3z-r-DVa2qCkdUAcf2bCRbCFMS4=.1563084f-76c4-43b0-b1b7-773ff3672102@github.com> Message-ID: <TTf5a3g2sxBNMixakrV-0bu54Ty6bMDbhzaW1s_wh-s=.69781267-0a1f-42a0-8a80-14c4ded0f266@github.com> On Fri, 11 Nov 2022 07:37:25 GMT, Xin Liu <xliu at openjdk.org> wrote: > It looks like no one is using mtNone for GrowableArray, but memReporter.cpp and virtualMemoryTracker still accept it. Shall we eliminate mtNone completely? Eliminating mtNone completely is out of scope for this PR. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From redestad at openjdk.org Fri Nov 11 12:34:34 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 12:34:34 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v12] In-Reply-To: <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> Message-ID: <n_LTLl65bwol4k86jX4SE-QEQpypao-2CDIgs0FzvW0=.0b2fdaee-b1c5-4d71-b5b5-6721dd654489@github.com> On Thu, 10 Nov 2022 15:03:26 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Qualified guess on shenandoahSupport fix-up The test failures in GHA are unrelated. Passed tier1-tier3 in our CI. Full benchmark results pending. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Fri Nov 11 12:43:12 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 12:43:12 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v12] In-Reply-To: <QmiDww20tXO8mk6NYBQCoh4KzZgsygyqgZ-xi9bXSIE=.07957333-e9ce-4031-a858-bcf131071d0c@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> <QmiDww20tXO8mk6NYBQCoh4KzZgsygyqgZ-xi9bXSIE=.07957333-e9ce-4031-a858-bcf131071d0c@github.com> Message-ID: <KHazADPPXEQSY_dmjll8s9vUb_SOwGI_0muZP4Nw-54=.f745b270-1e86-458d-b922-61c6df10115f@github.com> On Mon, 31 Oct 2022 12:25:43 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3484: >> >>> 3482: decrementl(index); >>> 3483: jmpb(LONG_SCALAR_LOOP_BEGIN); >>> 3484: bind(LONG_SCALAR_LOOP_END); >> >> You can share this loop with the scalar ones above. > > This might be messier than it first looks, since the two different loops use different temp registers based (long scalar can scratch cnt1, short scalar scratches the coef register). I'll have to think about this for a bit. As it happens in the latest version the vector loop drops into the scalar loop after all 32-element chunks has been processed. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dfuchs at openjdk.org Fri Nov 11 12:43:14 2022 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Fri, 11 Nov 2022 12:43:14 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v12] In-Reply-To: <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> Message-ID: <wVgCYIlIR3-QrRh8lKgZ860h2dwG1gnrbypR2wYhqa4=.8afae65f-0241-4140-866a-077d460816ce@github.com> On Thu, 10 Nov 2022 15:03:26 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Qualified guess on shenandoahSupport fix-up src/java.base/share/classes/java/lang/StringLatin1.java line 194: > 192: return switch (value.length) { > 193: case 0 -> 0; > 194: case 1 -> value[0]; shouldn't that be: case 1 -> value[0] & 0xff; ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Fri Nov 11 12:43:15 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 12:43:15 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v12] In-Reply-To: <wVgCYIlIR3-QrRh8lKgZ860h2dwG1gnrbypR2wYhqa4=.8afae65f-0241-4140-866a-077d460816ce@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <4DATzQcc3E5BBS0xrbxkKDyI64Lt-vpKvtgTGDh6Rew=.5bb45e2c-65bd-4c38-9a30-47feac3a32ca@github.com> <wVgCYIlIR3-QrRh8lKgZ860h2dwG1gnrbypR2wYhqa4=.8afae65f-0241-4140-866a-077d460816ce@github.com> Message-ID: <GX1iGvQzagbe4uX-SEDljBp_iyRe8yzA57va53OLQfw=.04f72a13-e887-4962-8e24-a88bd7121e93@github.com> On Fri, 11 Nov 2022 12:36:20 GMT, Daniel Fuchs <dfuchs at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Qualified guess on shenandoahSupport fix-up > > src/java.base/share/classes/java/lang/StringLatin1.java line 194: > >> 192: return switch (value.length) { >> 193: case 0 -> 0; >> 194: case 1 -> value[0]; > > shouldn't that be: > > case 1 -> value[0] & 0xff; Yes, good catch. I'll add a test case for negative latin1 bytes, too. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Fri Nov 11 13:00:06 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 13:00:06 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> Message-ID: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Missing & 0xff in StringLatin1::hashCode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/871f6cef..f08a656c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=11-12 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From duke at openjdk.org Fri Nov 11 13:25:35 2022 From: duke at openjdk.org (Piotr Tarsa) Date: Fri, 11 Nov 2022 13:25:35 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <kpo1s3-rkAk840MoqtII6K5uwXnV3U4bBiHx9ktQkS4=.50455cc3-5aae-4975-8459-f9ab9ed8c112@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode I think that microbenchmarking the string and array hash code computation with fixed lengths is hiding branch misprediction penalties and they can be pretty high (double digits of cycles lost), even on modern high performance CPU cores that have relatively short pipeline (compared to e.g. Pentium 4). Real world scenarios will probably entail varying, unpredictable, but still short string lengths, so that should be reflected in microbenchmarks and also be given high importance. I see you've added benchmarks like that already: https://github.com/openjdk/jdk/pull/10847/files#diff-0b5a3d8f2d9f485100f701d0917ffac9cf090a023055398154fa9ef1a9681b64R126-R156 (multibytes, multiints, etc) but you don't report on their measurements. Could you add their results? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From eosterlund at openjdk.org Fri Nov 11 13:35:41 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 13:35:41 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code [v2] In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <jtljrsUwxNc-va1NxAJJd4ozBrJ08ZWeItjn8av-D8s=.8e02184a-02d3-4f09-9257-86cabd685594@github.com> > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11042/files - new: https://git.openjdk.org/jdk/pull/11042/files/b1910e26..d09f094a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11042&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11042&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11042.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11042/head:pull/11042 PR: https://git.openjdk.org/jdk/pull/11042 From eosterlund at openjdk.org Fri Nov 11 13:35:42 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 13:35:42 GMT Subject: RFR: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <QqWbNSeUCN8McKIpJNkiJ3eT22wa4pohGfSUyqu8J1U=.d0219b4c-944e-4086-a6de-82606ce2f053@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> <yIaaqK4Eqwty8bn3dSjhEJXoKA3SNXM2pOO6NovgSho=.ccd162ae-5a19-4e17-ab3a-a05114a9a0ed@github.com> <3znrZnIih9b9Y8_6jkcCV_TPZi5N4UNUTzqDAP67Mug=.f48d2593-8fdc-4119-87fc-e53ad59c5cdd@github.com> <QqWbNSeUCN8McKIpJNkiJ3eT22wa4pohGfSUyqu8J1U=.d0219b4c-944e-4086-a6de-82606ce2f053@github.com> Message-ID: <WUEVYVFS8ZirUuhLtbjIOcg9kIye1U04EcvrJ9CrJCo=.4acb02dc-31e0-4c94-95d7-b8dea74fd808@github.com> On Wed, 9 Nov 2022 12:33:28 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >>> I think David have a point here, using e.g. non-gen ZGC returning from Fibonacci could be slowed-down. I assume gen-ZGC will only need the CMF once per safepoint? If so can we make it conditional per safepoint generation? >>> >>> Or make sure it is not a problem. >> >> I'm not quite sure what you mean by making it conditional per safepoint generation? > >> > I think David have a point here, using e.g. non-gen ZGC returning from Fibonacci could be slowed-down. I assume gen-ZGC will only need the CMF once per safepoint? If so can we make it conditional per safepoint generation? >> > Or make sure it is not a problem. >> >> I'm not quite sure what you mean by making it conditional per safepoint generation? > > It can mean a couple of things. My original thought was wrong, but we know that code stream oops only needs processing once between start of safepoint A and start of safepoint B. But different nmethod can be processed at different times. > So one CFW per nmethod in such safepoint epoch would be the maximum of CFW we need. > > This seem to map very good with "processing_completed_acquire()" "start_processing -> on_safepoint()". > > You are saying this information doing a more fine grained CMF it not worth the trouble. > > I cannot directly say you are correct, assuming you are, looks good. Thank you for the reviews, @robehn and @dholmes-ora! Just gonna fix that comment and then integrate. ------------- PR: https://git.openjdk.org/jdk/pull/11042 From redestad at openjdk.org Fri Nov 11 13:37:33 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 13:37:33 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <nH3j86IE1_KTy_6nc3jKsGsB4rHGGpm7Wb3wAhpgt8o=.c4d22157-44f2-4cc8-af02-6dd0d98269b0@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode > Yes, I had the same concern as @luhenry was obsessing about 0 and 1-element inputs and added the switches that we might be optimizing for extremely well-predicted micros, so he added those multi* variants. The overall result on both our setups is that we behave well even with mixed inputs, and with the new intrinsics the generated code end up on total a bit less branchy than the baseline across the range of input sizes. I'll upload full results for the multi*-micros once I have run the baseline and patched version thoroughly with no shortcuts. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From yzhu at openjdk.org Fri Nov 11 14:00:27 2022 From: yzhu at openjdk.org (Yanhong Zhu) Date: Fri, 11 Nov 2022 14:00:27 GMT Subject: Integrated: 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> References: <QCmM1v3C3Uke3E38ueOaivA2oyfxtujiK8ius2GzEBA=.7c13565b-1221-492b-9548-8a8fcee6fb99@github.com> Message-ID: <2w7lPG0ImlDmiP0-aNa6c3hNzLrJkPvZUYL2g2Y4TM0=.b3536801-eb89-4940-b3e9-5956c411f1bb@github.com> On Wed, 9 Nov 2022 03:08:57 GMT, Yanhong Zhu <yzhu at openjdk.org> wrote: > In this patch, count_bytecode() is modified by using "x7" as temporary register. Also implement histogram_bytecode() and histogram_bytecode_pair(), which can be enabled on debug mode by setting the options PrintBytecodeHistogram and PrintBytecodePairHistogram. > > The following is the output when PrintBytecodeHistogram or PrintBytecodePairHistogram is TRUE. > > $ java -XX:+PrintBytecodeHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 8101142 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 634592 7.83% dc fast_aload_0 > 471840 5.82% b6 invokevirtual > 376275 4.64% 2b aload_1 > 358520 4.43% e0 fast_iload > 332267 4.10% de fast_aaccess_0 > 270189 3.34% a7 goto > 249831 3.08% 19 aload > 223361 2.76% b9 invokeinterface > 215666 2.66% 1c iload_2 > 194877 2.41% b8 invokestatic > 192212 2.37% 2c aload_2 > 185826 2.29% 1b iload_1 > > $ java -XX:+PrintBytecodePairHistogram --version|head -n 20 > openjdk 20 2022-11-09 > OpenJDK Runtime Environment (fastdebug build 20) > OpenJDK 64-Bit Server VM (fastdebug build 20, mixed mode) > > Histogram of 7627721 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 102673 1.346% 84 a7 iinc goto > 85429 1.120% dc 2b fast_aload_0 aload_1 > 84394 1.106% dc b6 fast_aload_0 invokevirtual > 73131 0.959% b7 dc invokespecial fast_aload_0 > 64605 0.847% 2b b6 aload_1 invokevirtual > 64086 0.840% dc b9 fast_aload_0 invokeinterface > 63663 0.835% b6 dc invokevirtual fast_aload_0 > 59946 0.786% b6 de invokevirtual fast_aaccess_0 > 56631 0.742% 36 e0 istore fast_iload > 51261 0.672% b9 de invokeinterface fast_aaccess_0 > 49556 0.650% 3a 19 astore aload > 49106 0.644% a7 e0 goto fast_iload > > The items in column "relative" are equal to percentages of bytecodes in the result of the TraceBytecodes option(counting bytecodes manually). This pull request has now been integrated. Changeset: d4d183ed Author: Yanhong Zhu <yzhu at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d4d183edfea70a330cc5a092590f8b724fbb4259 Stats: 31 lines in 1 file changed: 22 ins; 5 del; 4 mod 8296301: Interpreter(RISC-V): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/11051 From stefank at openjdk.org Fri Nov 11 14:33:52 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 14:33:52 GMT Subject: RFR: 8296886: Fix various include sort order issues Message-ID: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, j ust like the other platform-independent headers in HotSpot. While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. ------------- Commit messages: - Various include order fixes Changes: https://git.openjdk.org/jdk/pull/11108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11108&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296886 Stats: 839 lines in 323 files changed: 433 ins; 299 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/11108.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11108/head:pull/11108 PR: https://git.openjdk.org/jdk/pull/11108 From rkennke at openjdk.org Fri Nov 11 14:37:38 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 11 Nov 2022 14:37:38 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> References: <mgQHdsI_oHeWVQEOQNQLrfplcvatEauNPfq1rEswJF4=.cc842c02-e7e0-4f72-95a0-1033ce101cfe@github.com> <TlQR1R0Jt3DqqMWNwZzUyRpSs7lqI0Cig2zpUnmYI3s=.e3add258-443b-4adb-9b31-9f9a76042ff4@github.com> <NDsoMk5BjB0oLGW6pQagOrm-CWrPjc-wwfRhA3vJt6g=.10119bb3-2d8b-4d53-bfaa-9f7a01dabfd7@github.com> <TCLX4fpqVeou-wQDj1SBil2xIyIzk2NFcj7pPpz_xjs=.4152872e-18aa-426b-b967-68118f7ba62d@github.com> <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> Message-ID: <J9AH8qxXd0iRgx13p-Eo5OxeoZWINDBGTsD41ijcZTo=.7259fe6e-ae46-45d2-8265-2c2b8d9e161b@github.com> On Fri, 28 Oct 2022 01:47:23 GMT, David Holmes <dholmes at openjdk.org> wrote: >> \-\-\-\-\- Original Message \-\-\-\-\- >>> From\: \"John R Rose\" \<jrose at openjdk\.org> >>> To\: hotspot\-dev at openjdk\.org\, serviceability\-dev at openjdk\.org\, shenandoah\-dev at openjdk\.org >>> Sent\: Thursday\, October 27\, 2022 10\:41\:44 PM >>> Subject\: Re\: RFR\: 8291555\: Replace stack\-locking with fast\-locking \[v7\] >> >>> On Mon\, 24 Oct 2022 11\:01\:01 GMT\, Robbin Ehn \<rehn at openjdk\.org> wrote\: >>> >>>> Secondly\, a question\/suggestion\: Many recursive cases do not interleave locks\, >>>> meaning the recursive enter will happen with the lock\/oop top of lock stack >>>> already\. Why not peak at top lock\/oop in lock\-stack if the is current just push >>>> it again and the locking is done\? \(instead of inflating\) \(exit would need to >>>> check if this is the last one and then proper exit\) >>> >>> The CJM paper \(Dice\/Kogan 2021\) mentions a \"nesting\" counter for this purpose\. >>> I suspect that a real counter is overkill\, and the \"unary\" representation >>> Robbin mentions would be fine\, especially if there were a point \(when the >>> per\-thread stack gets too big\) at which we go and inflate anyway\. >>> >>> The CJM paper suggests a full search of the per\-thread array to detect the >>> recursive condition\, but again I like Robbin\'s idea of checking only the most >>> recent lock record\. >>> >>> So the data structure for lock records \(per thread\) could consist of a series of >>> distinct values \[ A B C \] and each of the values could be repeated\, but only >>> adjacently\: \[ A A A B C C \] for example\. And there could be a depth limit as >>> well\. Any sequence of held locks not expressible within those limitations >>> could go to inflation as a backup\. >> >> Hi John\, >> a certainly stupid question\, i\'ve some trouble to see how it can be implemented given that because of lock coarsening \(\+ may be OSR\)\, the number of time a lock is held is different between the interpreted code and the compiled code\. >> >> R\?mi > >> So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. > @rose00 why only adjacently? Nested locking can be interleaved on different monitors. @dholmes-ora and all: I have prepared an alternative PR #10907 that implements the fast-locking behind a new experimental flag, and preserves the current stack-locking behavior as the default setting. It is currently implemented and tested on x86* and aarch64 arches. It is also less invasive because it keeps everything structurally the same (i.e. no method signature changes, no stack layout changes, etc). On the downside, it also means we can not have any of the associated cleanups and optimizations yet, but those are minor anyway. Also, there still is the risk that I make a mistake with the necessary factoring-out of current implementation. If we agree that this should be the way to go, then I would close this PR, and continue work on #10907. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From eosterlund at openjdk.org Fri Nov 11 14:39:21 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 14:39:21 GMT Subject: Integrated: 8295214: Generational ZGC: Guard nmethods from cross modifying code In-Reply-To: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> References: <Byj-Y1byGcaEyU0Zmh8zlRj0BITeYzXLJ-D_d9it1eU=.cd71d449-f4f2-4557-9c5c-99b7751fe664@github.com> Message-ID: <_eScMBnhbDbI5RYA9EOT5WOZvxqoTZt3vnVzds5x5E8=.581101f6-d804-471a-bb8d-63bfbae2f953@github.com> On Tue, 8 Nov 2022 16:19:47 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Generational ZGC will need to patch nmethod instructions outside of safepoints, and guard entries into the nmethods with cross modifying code fences. This is mostly taken care of by nmethod entry barrier code. But there are a few entries that don't go through nmethod entry barriers that need fixing. In particular when entering an nmethod by returning through the stack watermark barrier. This patch ensures that whenever the stack watermark barrier exposes a new nmethod, we also ensure that a cross modify fence is executed, so that any concurrently updated instructions can be safely executed. This pull request has now been integrated. Changeset: e7c2a8e6 Author: Erik ?sterlund <eosterlund at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e7c2a8e60e35da0919119e919ed162217049e89f Stats: 25 lines in 3 files changed: 20 ins; 5 del; 0 mod 8295214: Generational ZGC: Guard nmethods from cross modifying code Reviewed-by: dholmes, rehn ------------- PR: https://git.openjdk.org/jdk/pull/11042 From redestad at openjdk.org Fri Nov 11 14:44:41 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 11 Nov 2022 14:44:41 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <eoVEqxWCPLb4p8g4o7vhWf-cXMq1ORHPdtb4DUOG1rI=.4608f61f-6145-4c6e-9de1-0cd90a499315@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode Full JMH result comparison, linux-x64: https://jmh.morethan.io/?gist=014b1f9242ae3ad84cbbab893b738d48 Faster on all microbenchmarks and all input sizes. Up to 8x faster on large inputs. (Noting that the old StringHashCode::notCached and empty micros are not recalculating the hashCode since https://bugs.openjdk.org/browse/JDK-8221836 - the original intent of those microbenchmark was to test the hashing algorithm as per the new micros. We can probably remove those two..) ------------- PR: https://git.openjdk.org/jdk/pull/10847 From eosterlund at openjdk.org Fri Nov 11 14:49:42 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 14:49:42 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <ofs1xv7I47Zf48RVFGfNZWiCh5p6qk4xCpPXFjFxlL0=.98bda3b8-6600-4f70-a332-28424e55f082@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. Hmm. Now that "jvm.hpp" becomes "include/jvm.hpp" it seems like it would make sense if the include guard for jvm.hpp reflected the relative file path, like most other files. I noticed that its current include guard is `#ifndef _JAVASOFT_JVM_H_`. When I search for "javasoft" I get no hits. I guess it isn't obvious if we can change it without introducing build issues for users out there? So maybe it should remain the way it is anyway. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From eosterlund at openjdk.org Fri Nov 11 16:23:38 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 16:23:38 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code Message-ID: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. In particular, 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. 3) Refactoring the stack chunk allocation code Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. ------------- Commit messages: - Generational ZGC: Loom support Changes: https://git.openjdk.org/jdk/pull/11111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296875 Stats: 969 lines in 38 files changed: 636 ins; 225 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From xliu at openjdk.org Fri Nov 11 17:54:27 2022 From: xliu at openjdk.org (Xin Liu) Date: Fri, 11 Nov 2022 17:54:27 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> Message-ID: <Q-Q96KgbLLXuAge0hRyVQgtzFAgETWIYfj1vDeEvD_4=.6dad23b8-39f9-409c-9c71-ea543a6ebcf3@github.com> On Fri, 11 Nov 2022 08:47:39 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Mark constructors explicit LGTM. I am not a reviewer. ------------- Marked as reviewed by xliu (Committer). PR: https://git.openjdk.org/jdk/pull/11086 From duke at openjdk.org Fri Nov 11 17:56:55 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 17:56:55 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <g9b4K88VLczZ7zoHN0oisP9Qju0EvlQWW5voCyJlGPQ=.6363291f-20aa-4fbc-9a81-96af9b54c76f@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Vladimir's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/835fbe3a..2a225e42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=13-14 Stats: 23 lines in 2 files changed: 0 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 11 18:12:20 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 18:12:20 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> Message-ID: <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> On Fri, 11 Nov 2022 01:26:40 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> live review with Sandhya > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 733: > >> 731: void andptr(Register src1, Register src2) { LP64_ONLY(andq(src1, src2)) NOT_LP64(andl(src1, src2)) ; } >> 732: >> 733: #ifdef _LP64 > > Why is it x64-specific? I believe its needed. TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the 'easy' fix.. > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 161: > >> 159: const XMMRegister P2_H = xmm5; >> 160: const XMMRegister TMP1 = xmm6; >> 161: const Register polyCP = r13; > > Could be renamed to `rscratch` (or `tmp`) since it doesn't hold constant base address anymore. done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 11 18:12:23 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 18:12:23 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13] In-Reply-To: <S8GnNjm6Y0ggYmpkr3b0rCYNnOIm0zyZoUqwGA-MAbk=.4fffdc3e-e5f3-498c-84c0-3c3719cc9e14@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <AdNrtqNSMPZSK_Tfbr4q-YNlelgx8AEVeDtn-4EoV6Y=.15cead8f-06fd-40bd-9e39-fbb7ecc2cbd6@github.com> <S8GnNjm6Y0ggYmpkr3b0rCYNnOIm0zyZoUqwGA-MAbk=.4fffdc3e-e5f3-498c-84c0-3c3719cc9e14@github.com> Message-ID: <Une2D1sgxDyIg6YkPyIWxOg4MwrnGeWYw8TeRrlwIk0=.363a0d24-6822-49b9-962a-338d5f784b37@github.com> On Fri, 11 Nov 2022 01:25:07 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> jcheck > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 252: > >> 250: private void processMultipleBlocks(byte[] input, int offset, int length, long[] aLimbs, long[] rLimbs) { >> 251: while (length >= BLOCK_LENGTH) { >> 252: n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); > > You could call `processBlock(input, offset, BLOCK_LENGTH);` here. done (duh.. thanks, neater code) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From eosterlund at openjdk.org Fri Nov 11 19:44:27 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 11 Nov 2022 19:44:27 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <yt4yTcfj4Q41X8EparF0OPS2CufoZw9pvBceg0o75X4=.5c93b33a-e01d-47d3-8ffd-e1099bc626cf@github.com> On Fri, 11 Nov 2022 16:16:18 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Nice to have PR 11111. It's gonna take a long time until we see 111111. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From vlivanov at openjdk.org Fri Nov 11 20:01:34 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 20:01:34 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> Message-ID: <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> On Fri, 11 Nov 2022 18:08:50 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 733: >> >>> 731: void andptr(Register src1, Register src2) { LP64_ONLY(andq(src1, src2)) NOT_LP64(andl(src1, src2)) ; } >>> 732: >>> 733: #ifdef _LP64 >> >> Why is it x64-specific? > > I believe its needed. > > TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the 'easy' fix.. Right, `addq` instructions are x64-specific. I was confused because `assembler_x86.hpp` doesn't declare them as such which is a bug. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 11 20:10:33 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 20:10:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> Message-ID: <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> On Fri, 11 Nov 2022 19:56:40 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> I believe its needed. >> >> TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the 'easy' fix.. > > Right, `addq` instructions are x64-specific. I was confused because `assembler_x86.hpp` doesn't declare them as such which is a bug. I am mystified at how it actually gets removed from the `assembler_x86.o` object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef` but its not there.. so.. code-generation? `sed`-like preprocessing? Can one edit object files after the gcc ran? The build must be doing something clever!! Haven't seen it yet.. Whatever the trick is, `assembler_x86.hpp` gets it, but not `macroAssembler_x86.hpp`. If it doesn't ring any bells, maybe I will spend some more time looking at the traces, maybe can figure out what the build script is doing to remove the symbol. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Fri Nov 11 20:38:33 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 20:38:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> Message-ID: <pYQs4FMpxhCrhceVFxyy7tvGrPG3t8RzbKfSOSPyzbs=.e3b7e01d-bbd7-4f73-bf57-245389bd5ec1@github.com> On Fri, 11 Nov 2022 20:08:27 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Right, `addq` instructions are x64-specific. I was confused because `assembler_x86.hpp` doesn't declare them as such which is a bug. > > I am mystified at how it actually gets removed from the `assembler_x86.o` object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef` but its not there.. so.. code-generation? `sed`-like preprocessing? Can one edit object files after the gcc ran? The build must be doing something clever!! Haven't seen it yet.. > > Whatever the trick is, `assembler_x86.hpp` gets it, but not `macroAssembler_x86.hpp`. > > If it doesn't ring any bells, maybe I will spend some more time looking at the traces, maybe can figure out what the build script is doing to remove the symbol. It's not specific to `andq`: there's a huge `#ifdef` block around the definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a nested `#ifdef _LP64` (lines 13515-13585)!) , but declarations aren't guarded by `#ifdef _LP64`. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Nov 11 20:49:40 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 11 Nov 2022 20:49:40 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <pYQs4FMpxhCrhceVFxyy7tvGrPG3t8RzbKfSOSPyzbs=.e3b7e01d-bbd7-4f73-bf57-245389bd5ec1@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> <pYQs4FMpxhCrhceVFxyy7tvGrPG3t8RzbKfSOSPyzbs=.e3b7e01d-bbd7-4f73-bf57-245389bd5ec1@github.com> Message-ID: <hp3DuRUsHA39f-YoJ2AuxY-7ZcVPKDTFXFVB46xsDBo=.d077e145-3b07-43c5-93f4-6d00c29f101f@github.com> On Fri, 11 Nov 2022 20:34:34 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> I am mystified at how it actually gets removed from the `assembler_x86.o` object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef` but its not there.. so.. code-generation? `sed`-like preprocessing? Can one edit object files after the gcc ran? The build must be doing something clever!! Haven't seen it yet.. >> >> Whatever the trick is, `assembler_x86.hpp` gets it, but not `macroAssembler_x86.hpp`. >> >> If it doesn't ring any bells, maybe I will spend some more time looking at the traces, maybe can figure out what the build script is doing to remove the symbol. > > It's not specific to `andq`: there's a huge `#ifdef` block around the definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a nested `#ifdef _LP64` (lines 13515-13585)!) , but declarations aren't guarded by `#ifdef _LP64`. Yeah, just got to about the same conclusion by looking at the preprocessor `-E` output.. its declared in the header, but not defined in the 'cpp' file.. One would think that that's a compile error, but its been more then a decade since I looked at the C++ spec; 'C++ compiler is always right'. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From stefank at openjdk.org Fri Nov 11 20:59:22 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 11 Nov 2022 20:59:22 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> Message-ID: <zi9p-a-xzzC4aJlFUV1wlG0uMdeTbD5JhVXqL5o8wyM=.9a957d70-43b7-4e9b-882d-75e1f537eeb3@github.com> On Fri, 11 Nov 2022 08:47:39 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Mark constructors explicit Thanks for reviewing! ------------- PR: https://git.openjdk.org/jdk/pull/11086 From vlivanov at openjdk.org Fri Nov 11 22:51:28 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 22:51:28 GMT Subject: RFR: 8294033: x86_64: libm stubs are missing In-Reply-To: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> References: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> Message-ID: <fWbEJLSyeGrm_asr7gx9phPJ50eLreFyHEA45pLPX-g=.70060ba2-d1a4-4227-9c45-f08301eb79a1@github.com> On Fri, 11 Nov 2022 02:07:22 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > There's a regression from [JDK-8293285](https://bugs.openjdk.org/browse/JDK-8293285) refactoring where I forgot to call generate_libm_stubs() during stub initialization phase. > > The patch restores proper stub init sequence and also piles some minor refactorings on top. > > Testing: hs-tier1 - hs-tier2 > > Thanks! Thanks for the reviews, Jorn and Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/11100 From vlivanov at openjdk.org Fri Nov 11 22:53:00 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Nov 2022 22:53:00 GMT Subject: Integrated: 8294033: x86_64: libm stubs are missing In-Reply-To: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> References: <PujwGQBG3G_aqx78tLTBHSsRi8EcbELG8Uwe1T-q1Ns=.2f550f8d-c8fb-4e3b-ac7a-f9d68aed0ee0@github.com> Message-ID: <3s2imokNBdwWBug6RN59zbybO6ni_IkKdTgA3-IZfFk=.df1131f4-23de-4b3e-b618-7494dd16981d@github.com> On Fri, 11 Nov 2022 02:07:22 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > There's a regression from [JDK-8293285](https://bugs.openjdk.org/browse/JDK-8293285) refactoring where I forgot to call generate_libm_stubs() during stub initialization phase. > > The patch restores proper stub init sequence and also piles some minor refactorings on top. > > Testing: hs-tier1 - hs-tier2 > > Thanks! This pull request has now been integrated. Changeset: 34a499de Author: Vladimir Ivanov <vlivanov at openjdk.org> URL: https://git.openjdk.org/jdk/commit/34a499de8edc9a6b750ae7af356fa9cb1d2a0748 Stats: 25 lines in 2 files changed: 6 ins; 9 del; 10 mod 8294033: x86_64: libm stubs are missing Reviewed-by: jvernee, kvn ------------- PR: https://git.openjdk.org/jdk/pull/11100 From kbarrett at openjdk.org Sat Nov 12 00:48:34 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 12 Nov 2022 00:48:34 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. I don't think platform-specific files that are included without providing a directory should be sorted with the shared files on just the filename. That just scatters the platform-specific files, in a way that I think isn't helpful. Such files ought to be at the end, using the appropriate macros. But there are some filenames that don't have the appropriate stem for use with those macros. I think those files should be renamed. So I disagree with moving the includes of these files. I'd prefer they were left as-is by this change, with a followup to rename files and use the appropriate macros. There are some files external to hotspot that are being included without any directory component, such as jni.h and jimage.hpp. Moving them into the middle of the "shared" includes, sorted by just filename name without a directory, seems like it is hiding external dependencies. I don't have a well-formed suggestion for what to do about them, but I think the proposed moves aren't helpful. I wonder if standard library includes should also be sorted. I think you might have done that in some places, but not sure about that. Certainly there are some that aren't. There are a number of gtest where a standard library include was moved to after the include of unittest.hpp. I think unittest.hpp is supposed to always be the last included header, and added a comment to those places. But now I can't find anything that says that. (BTW, I agree with moving things like threadHelper.inline.hpp into a block after normal includes, with unittest.hpp.) Related, there are 62 .cpp files missing "precompiled.hpp" `find . -name "*.cpp" -exec grep -L "precompiled.hpp" {} ;` At least some (os_linux.cpp, os_posix.cpp) have a comment claiming intentionality, though that seems suspect (os_linux.cpp has been that way since before mercurial). I think this should be another followup. I wish this change had been split up a bit. One change (or a separate commit within this change) that dealt with files in share/include could have been mechanically checked or pretty easily verfied by eye. That would have left a much smaller residue that needed going through more carefully. Touching a whole mess of gtests to put a blank line between other includes and the final include of unittest.hpp could also have been separate (PR or commit). Should there be any additions to the style guide to codify some of these changes? I can't find anything about where unittest.hpp and other gtest-specific headers should go. Nor can I find anything about where standard library headers should go. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 30: > 28: #include "compiler/compiler_globals.hpp" > 29: #include "compiler/disassembler.hpp" > 30: #include "crc32c.h" This ought to remain at the end, included using `CPU_HEADER_H("crc32c")`, but the file doesn't have the appropriate suffix. I think the file name should be changed so that can be done. This seems to be the only C++ file under cpu/ that doesn't have the appropriate suffix. There are similarly "misnamed" files under various os/ subdirs; I didn't look for any in os_cpu/. src/hotspot/os/windows/jvm_windows.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include "include/jvm.h" > 27: #include "os_windows.hpp" os_windows should be at the end, included using `OS_HEADER("os")`. src/hotspot/os_cpu/bsd_zero/os_bsd_zero.cpp line 57: > 55: #if !defined(__APPLE__) && !defined(__NetBSD__) > 56: #include <pthread.h> > 57: # include <pthread_np.h> /* For pthread_attr_get_np */ Remove unneeded space. src/hotspot/share/cds/classListParser.cpp line 42: > 40: #include "interpreter/bytecodeStream.hpp" > 41: #include "interpreter/linkResolver.hpp" > 42: #include "jimage.hpp" Sorting jimage.hpp with hotspot/share stuff seems weird, and is kind of hiding this external dependency. (It's coming from java.base.) src/hotspot/share/prims/scopedMemoryAccess.cpp line 34: > 32: #include "runtime/interfaceSupport.inline.hpp" > 33: #include "runtime/jniHandles.inline.hpp" > 34: #include "runtime/deoptimization.hpp" runtime/deoptimization.hpp should be at the front of the runtime/ list. src/hotspot/share/prims/whitebox.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include <new> Why was `<new>` removed? test/hotspot/gtest/jfr/test_adaptiveSampler.cpp line 48: > 46: #include "unittest.hpp" > 47: > 48: #include <cmath> Why is this after unittest.hpp? test/hotspot/gtest/metaprogramming/test_enableIf.cpp line 32: > 30: #include "unittest.hpp" > 31: > 32: #include <type_traits> Why is this following unittest.hpp. test/hotspot/gtest/runtime/test_os_linux.cpp line 28: > 26: #ifdef LINUX > 27: > 28: #include "os_linux.hpp" Why do we even need this - we have runtime/os.hpp and we're on Linux. test/hotspot/gtest/utilities/test_align.cpp line 31: > 29: #include "unittest.hpp" > 30: > 31: #include <limits> Why after unittest.hpp? test/hotspot/gtest/utilities/test_bitMap_setops.cpp line 34: > 32: #include "unittest.hpp" > 33: > 34: #include <stdlib.h> Why after unittest.hpp. test/hotspot/gtest/utilities/test_count_leading_zeros.cpp line 32: > 30: #include "unittest.hpp" > 31: > 32: #include <limits> [pre-existing] Why after unittest.hpp? test/hotspot/gtest/utilities/test_enumIterator.cpp line 29: > 27: #include "unittest.hpp" > 28: > 29: #include <type_traits> Why after unittest.hpp? test/hotspot/gtest/utilities/test_globalDefinitions.cpp line 33: > 31: #include "unittest.hpp" > 32: > 33: #include <type_traits> Why after unittest.hpp. test/hotspot/gtest/utilities/test_nonblockingQueue.cpp line 32: > 30: #include "threadHelper.inline.hpp" > 31: #include "unittest.hpp" > 32: #include <new> Why removing `<new>`? (and pre-existing, why after unittest.hpp?) test/hotspot/gtest/utilities/test_population_count.cpp line 33: > 31: #include "unittest.hpp" > 32: > 33: #include <limits> Why after unittest.hpp? test/hotspot/gtest/utilities/test_powerOfTwo.cpp line 31: > 29: #include "unittest.hpp" > 30: > 31: #include <limits> Why after unittest.hpp? ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11108 From fyang at openjdk.org Sat Nov 12 01:33:34 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 12 Nov 2022 01:33:34 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <yt4yTcfj4Q41X8EparF0OPS2CufoZw9pvBceg0o75X4=.5c93b33a-e01d-47d3-8ffd-e1099bc626cf@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <yt4yTcfj4Q41X8EparF0OPS2CufoZw9pvBceg0o75X4=.5c93b33a-e01d-47d3-8ffd-e1099bc626cf@github.com> Message-ID: <9vLlu1jO4Rh1tE1-Fm5xb-79FvYNmyLw6ORbZEkZcvM=.3c47a0b1-1183-40c1-b86b-707d0ca03d18@github.com> On Fri, 11 Nov 2022 19:41:56 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > Nice to have PR 11111. It's gonna take a long time until we see 111111. Nice PR number :-) May I ask if you could also add handling for riscv while you are at it? We have ported loom to this platform recently [1]. [1] https://git.openjdk.org/jdk/commit/91292d56a9c2b8010466d105520e6e898ae53679 ------------- PR: https://git.openjdk.org/jdk/pull/11111 From vlivanov at openjdk.org Sat Nov 12 02:06:33 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 12 Nov 2022 02:06:33 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode I haven't closely looked at the stub itself. Commented mostly on C2 and JDK parts. src/hotspot/cpu/x86/x86_64.ad line 12073: > 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2, rRegI tmp3, rFlagsReg cr) > 12072: %{ > 12073: predicate(UseAVX >= 2 && ((VectorizedHashCodeNode*)n)->mode() == VectorizedHashCodeNode::LATIN1); If you represent `VectorizedHashCodeNode::mode()` as an input, it would allow to abstract over supported modes and come up with a single AD instruction. Take a look at `VectorMaskCmp` for an example (not a perfect one though since it has both _predicate member and constant input which is redundant). src/hotspot/cpu/x86/x86_64.ad line 12081: > 12079: format %{ "Array HashCode byte[] $ary1,$cnt1 -> $result // KILL all" %} > 12080: ins_encode %{ > 12081: __ arrays_hashcode($ary1$$Register, $cnt1$$Register, $result$$Register, What's the motivation to keep the stub code inlined instead of calling into a stand-alone pre-generated version of the stub? src/hotspot/share/opto/intrinsicnode.hpp line 175: > 173: // as well as adjusting for special treatment of various encoding of String > 174: // arrays. Must correspond to declared constants in jdk.internal.util.ArraysSupport > 175: typedef enum HashModes { LATIN1 = 0, UTF16 = 1, BYTE = 2, CHAR = 3, SHORT = 4, INT = 5 } HashMode; I question the need for `LATIN1` and `UTF16` modes. If you lift some of input adjustments (initial value and input size) into JDK, it becomes indistinguishable from `BYTE`/`CHAR`. Then you can reuse existing constants for basic types. src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 185: > 183: */ > 184: @IntrinsicCandidate > 185: public static int vectorizedHashCode(Object array, byte mode) { The intrinsic can be generalized by: 1. expanding `array` input into `base`, `offset`, and `length`. It will make it applicable to any type of data source (on-heap/off-heap `ByteBuffer`s, `MemorySegment`s. 2. passing initial value as a parameter. Basically, hash code computation can be represented as a reduction: `reduce(initial_val, (acc, v) -> 31 * acc + v, data)`. You hardcode the operation, but can make the rest variable. (Even the operation can be slightly generalized if you make 31 variable and then precompute the table at runtime. But right now I don't see much value in investing into that.) ------------- PR: https://git.openjdk.org/jdk/pull/10847 From vlivanov at openjdk.org Sat Nov 12 02:06:33 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 12 Nov 2022 02:06:33 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> Message-ID: <corxcBxj97U75Q-5Eh1CfvjkyM8L8w7F1oduaYyf11g=.ca02efb8-76b0-4afe-af25-2c3aa6cd8b4b@github.com> On Sat, 12 Nov 2022 00:55:56 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing & 0xff in StringLatin1::hashCode > > src/hotspot/cpu/x86/x86_64.ad line 12081: > >> 12079: format %{ "Array HashCode byte[] $ary1,$cnt1 -> $result // KILL all" %} >> 12080: ins_encode %{ >> 12081: __ arrays_hashcode($ary1$$Register, $cnt1$$Register, $result$$Register, > > What's the motivation to keep the stub code inlined instead of calling into a stand-alone pre-generated version of the stub? Also, switching to stand-alone stubs would enable us to compose a generic stub version (as we do in `StubGenerator::generate_generic_copy()` for arraycopy stubs). But it would be even better to do the dispatching on JDK side and always pass a constant into the intrinsic. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From vlivanov at openjdk.org Sat Nov 12 02:10:40 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 12 Nov 2022 02:10:40 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode Also, I'd like to note that C2 auto-vectorization support is not too far away from being able to optimize hash code computations. At some point, I was able to achieve some promising results with modest tweaking of SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ ------------- PR: https://git.openjdk.org/jdk/pull/10847 From stuefe at openjdk.org Sat Nov 12 07:05:32 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 12 Nov 2022 07:05:32 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> Message-ID: <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> > While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. > > The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : > > > #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(0) : NativeCallStack::empty_stack()) > #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(1) : NativeCallStack::empty_stack()) > > > and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: > > > void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); > > > In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). > > However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): > > > 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # Load tracking level > cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> > cb9a7e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9a80: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> > # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: > cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> > cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 > cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 > cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) > cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): > cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx > ... > cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. > > --------------------- > > The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. > > This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. > > In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: > > > 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # load tracking level > cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> > cb990e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9910: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> > # no: nothing more to do ... > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > ... > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: > cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx > .. > cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. > > -------------- > > Results: > > When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - reduce unnecessary diffs - explicit constructor for fake callstacks; revert default ctor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11040/files - new: https://git.openjdk.org/jdk/pull/11040/files/a9327485..8657b2d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11040&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11040&range=00-01 Stats: 35 lines in 5 files changed: 20 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/11040.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11040/head:pull/11040 PR: https://git.openjdk.org/jdk/pull/11040 From stuefe at openjdk.org Sat Nov 12 07:12:39 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 12 Nov 2022 07:12:39 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> Message-ID: <FvpJtql17pzl7QRphRPjvJi0PEncsmjcwyry2hvIvRI=.7f60f06e-fde4-4ee9-85ba-43b305325e2b@github.com> On Thu, 10 Nov 2022 20:41:49 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > (I commented before but for some reason it's been lost). > > There's one use of the default constructor that you've missed (I found that by removing the body of the NativeCallStack() constructor): > > > ReservedMemoryRegion(const ReservedMemoryRegion& rr) : > VirtualMemoryRegion(rr.base(), rr.size()) { > *this = rr; > } > > > I think it will be much safer to leave the existing default constructor, and have something like: > > > private: > NativeCallStack(int dummy) { > _dummy[0] = NULL; > } > > public: > inline static NativeCallStack fake_stack() { > NativeCallStack fake(0); > return fake; > } > > > This will keep the behavior the same as before. Note that your patch will change the behavior if the fake stack is actually used. E.g., for this function: > > > inline bool is_empty() const { > return _stack[0] == NULL; > } > > > If you are absolutely sure that the fake stacks are never used, and really want to get rid of the `_stack[0] = NULL`, I would suggest adding a new debug-only field, and add asserts like this in all public functions: > > > inline bool is_empty() const { > assert(!is_fake, "sanity"); > return _stack[0] == NULL; > } After thinking about @iklam s feedback, I changed the patch and introduced an explicit constructor to build fake callstacks for CALLER_PC and CURRENT_PC. That leaves the current mechanics of empty callstacks unchanged. The fake constructor will leave the object uninitialized in release builds, so no construction costs there. It will zap it with a test pattern in debug builds. That test pattern will be asserted to make sure we don't accidentally use them. I refrained from making any further changes, but plan to do future improvements in follow up RFEs. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From fyang at openjdk.org Sat Nov 12 08:12:32 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 12 Nov 2022 08:12:32 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> On Fri, 11 Nov 2022 16:16:18 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. PS: I see JVM crashes when running Skynet with extra VM option: -XX:+VerifyContinuations on linux-aarch64 platform. $java --enable-preview -XX:+VerifyContinuations Skynet # A fatal error has been detected by the Java Runtime Environment: # after -XX: or in .hotspotrc: SuppressErrorAt=# # Internal Error/stackChunkOop.cpp (/home/realfyang/openjdk-jdk/src/hotspot/share/oops/stackChunkOop.cpp:433), pid=1904185:433, tid=1904206 [thread 1904216 also had an error]# assert(_chunk->bitmap().at(index)) failed: Bit not set at index 208 corresponding to 0x0000000637c512d0 # # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.realfyang.openjdk-jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) ------------- PR: https://git.openjdk.org/jdk/pull/11111 From duke at openjdk.org Sat Nov 12 15:29:21 2022 From: duke at openjdk.org (Piotr Tarsa) Date: Sat, 12 Nov 2022 15:29:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <mTgCRyZ8rmuNMWQfZuyaTrR0u40uNOuPcPL9ChOYqCE=.b1c8e88d-5eaa-4c6d-a22d-efed2cfe678d@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode Out of curiosity: how does this intrinsic affect time-to-safepoint? Does it matter? I don't see any safepoint poll, but then I don't precisely know how safepoints work, so I could be missing something. Theoretically, with 2^31 elements count limit in Java, the whole computation is always a fraction of a second, but maybe it would matter with e.g. ZGC, which could ask for a safepoint while the thread is hashing an array with 2 billion ints. > 1. expanding `array` input into `base`, `offset`, and `length`. It will make it applicable to any type of data source (on-heap/off-heap `ByteBuffer`s, `MemorySegment`s. There could be memory-mapped ByteBuffers and MemorySegments and that would make the whole hashing operation much more prone to be exceedingly long and therefore possibly dramatically affecting time-to-safepoint. Again, this could be misunderstanding on my side, but I'm curious how safepoints interplay with this intrinsic. Also, even without memory mapping, MemorySegments can be much larger than 2^31 elements in size, so hashing a huge MemorySegment could take much longer than hashing even the biggest ordinary (limited by 31-bit indexing) array of primitives. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From jwaters at openjdk.org Sat Nov 12 16:45:19 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 12 Nov 2022 16:45:19 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features Message-ID: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cases were caught by the very briefly integrated 8296115 ------------- Commit messages: - More Style Changes - Minor Style Changes - Merge remote-tracking branch 'upstream/master' into conformance - Missed an attribute - Oops v2 - Resolve Conflict - This will likely fail, but that's not important - Further formatting - Merge jdkMerge remote-tracking branch 'upstream/master' into warnings - Code style - ... and 53 more: https://git.openjdk.org/jdk/compare/657a0b2f...6d85c432 Changes: https://git.openjdk.org/jdk/pull/11081/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11081&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295146 Stats: 254 lines in 48 files changed: 100 ins; 23 del; 131 mod Patch: https://git.openjdk.org/jdk/pull/11081.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11081/head:pull/11081 PR: https://git.openjdk.org/jdk/pull/11081 From stuefe at openjdk.org Sat Nov 12 20:44:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 12 Nov 2022 20:44:43 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors Message-ID: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. --- Patch - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. - Removed a stray newline from print_native_stack to clean output. - added regression testing for this feature. I removed my name from the test since we don't do this anymore. - added clarifying comments to the test and code - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) Output looks like this: $ java ... -XX:+ErrorLogSecondaryErrorDetails will produce, for secondary errors, siginfo and call stack. [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) C [libc.so.6+0x43090] V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) ] ------------- Commit messages: - JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors Changes: https://git.openjdk.org/jdk/pull/11118/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296907 Stats: 95 lines in 3 files changed: 71 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From yadongwang at openjdk.org Sun Nov 13 03:11:28 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Sun, 13 Nov 2022 03:11:28 GMT Subject: Integrated: 8296630: Fix SkipIfEqual on AArch64 and RISC-V In-Reply-To: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> References: <QCu9D-fYAk4Y8mxpC8rUXQB6wHGExfdUglPnJ0YWV5E=.0ff0fc5d-50a2-44a0-806d-873dc85aff28@github.com> Message-ID: <H8pKRZr2Ld2D4KwznqBU52liG93ZMbSge3AnCNtkU3k=.ae3b9725-5f93-4733-8fad-5cc7d5dba6b7@github.com> On Thu, 10 Nov 2022 03:17:37 GMT, Yadong Wang <yadongwang at openjdk.org> wrote: > SkipIfEqual was supposed to load a flag value from some memory, compare it with a input boolean value, and jump to a specific label they a equals. The implementation on x86 and s390 platforms meets expectations, and ppc uses SkipIfEqualZero. However, on AArch64 and RISC-V platforms, the input argument "value" is not used, and jumping-if-equal-zero is generated only. That's not correct, but works well since only false passed on all call sites so far. > > AArch64 tier1, riscv hotspot & jdk tier1 have been tested. > Additional cases with dtrace tested on AArch64: > test/hotspot/jtreg/serviceability/dtrace/DTraceOptionsTest.java > test/hotspot/jtreg/compiler/runtime/Test8168712.java This pull request has now been integrated. Changeset: a2cdcdd6 Author: Yadong Wang <yadongwang at openjdk.org> Committer: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/a2cdcdd65dbbc6717c363fc4e22d9b16a4dea986 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod 8296630: Fix SkipIfEqual on AArch64 and RISC-V Reviewed-by: ngasson, fyang, luhenry, aph ------------- PR: https://git.openjdk.org/jdk/pull/11076 From xuelei at openjdk.org Sun Nov 13 06:39:00 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Sun, 13 Nov 2022 06:39:00 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 Message-ID: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Hi, May I have this update reviewed? The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. Thanks, Xuelei ------------- Commit messages: - use const size_t - size correction - size correction - 8296812: sprintf is deprecated in Xcode 14 Changes: https://git.openjdk.org/jdk/pull/11115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296812 Stats: 109 lines in 24 files changed: 15 ins; 3 del; 91 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Sun Nov 13 07:54:34 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 07:54:34 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <wr5oXl5Ezc50JWzrdwCRd0v3CD4CxKlHH7zipsCQ9g4=.7ae4bc7c-b449-4396-8775-af3f4da3da84@github.com> On Fri, 11 Nov 2022 22:41:19 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Hi @XueleiFan, could you use `jio_snprintf` instead (see include/jvm_io.h)? That is what we usually do for snprintf. jio_snprintf hides platform particularities wrt snprintf. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Sun Nov 13 08:28:37 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Sun, 13 Nov 2022 08:28:37 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <wr5oXl5Ezc50JWzrdwCRd0v3CD4CxKlHH7zipsCQ9g4=.7ae4bc7c-b449-4396-8775-af3f4da3da84@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <wr5oXl5Ezc50JWzrdwCRd0v3CD4CxKlHH7zipsCQ9g4=.7ae4bc7c-b449-4396-8775-af3f4da3da84@github.com> Message-ID: <JWNzymkzvyUWTmhVij_o9bBRpdj-ZiSYAqfRcGylsFs=.2e40defa-4f49-4892-8d2a-26f324024def@github.com> On Sun, 13 Nov 2022 07:50:43 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > could you use `jio_snprintf` instead (see include/jvm_io.h)? That is what we usually do for snprintf. jio_snprintf hides platform particularities wrt snprintf. > Good to know that. Thank you! While I was doing the replacement from `snprintf` to `jio_snprintf`, I noticed a lot of existing use of `snprintf` in the files touched in this PR. What do you think if we have a `snprintf` clean up in a followed PR? hotspot $ find . -type f |xargs grep snprintf |grep -v jio_snprintf |wc 262 1895 26574 ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Sun Nov 13 09:01:31 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 09:01:31 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <JWNzymkzvyUWTmhVij_o9bBRpdj-ZiSYAqfRcGylsFs=.2e40defa-4f49-4892-8d2a-26f324024def@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <wr5oXl5Ezc50JWzrdwCRd0v3CD4CxKlHH7zipsCQ9g4=.7ae4bc7c-b449-4396-8775-af3f4da3da84@github.com> <JWNzymkzvyUWTmhVij_o9bBRpdj-ZiSYAqfRcGylsFs=.2e40defa-4f49-4892-8d2a-26f324024def@github.com> Message-ID: <XKbf12XHrXIVkeJhUEdRrGgo_xBstNB2Swf3t6WkU-E=.8692e760-9c98-44c2-a74e-3157a700ed5c@github.com> On Sun, 13 Nov 2022 08:25:57 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: > > could you use `jio_snprintf` instead (see include/jvm_io.h)? That is what we usually do for snprintf. jio_snprintf hides platform particularities wrt snprintf. > > Good to know that. Thank you! > > While I was doing the replacement from `snprintf` to `jio_snprintf`, I noticed a lot of existing use of `snprintf` in the files touched in this PR. What do you think if we have a `snprintf` clean up in a followed PR? > > ``` > hotspot $ find . -type f |xargs grep snprintf |grep -v jio_snprintf |wc > 262 1895 26574 > ``` Hmm, possibly. We may look again at the exact reason why we use jio_snprintf. Maybe it is less important nowadays, with reduced platform number (no solaris) and Windows being more standard conform than it had been in the past. Lets hear what others think. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Sun Nov 13 10:30:05 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 10:30:05 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v2] In-Reply-To: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> Message-ID: <zxmLO7iE3Mine5TpSbY86TDrCjFPpnfTfjOtQjbbuXk=.fcc194a4-05a5-482b-a887-f53819af032d@github.com> > When doing performance- and memory analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. > > Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. > > Examples: > > NMT, thread stacks, 10000 Threads, default: > > > - Thread (reserved=10332400KB, committed=331828KB) > (thread #10021) > (stack: reserved=10301560KB, committed=300988KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) > > > NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: > > > - Thread (reserved=10332400KB, committed=10284360KB) > (thread #10021) > (stack: reserved=10301560KB, committed=10253520KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - use shadow_zone_safe_limit - Use os::pretouch_memory - AlwaysPreTouchStacks ------------- Changes: https://git.openjdk.org/jdk/pull/10403/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=01 Stats: 118 lines in 4 files changed: 118 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10403.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10403/head:pull/10403 PR: https://git.openjdk.org/jdk/pull/10403 From aturbanov at openjdk.org Sun Nov 13 11:17:34 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Sun, 13 Nov 2022 11:17:34 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <o-asZub5oMRaRYIwVPib18RZg-GUvTXgBpw8YHoKlYE=.e56e5285-d948-4c96-a80c-896253d3e8cb@github.com> On Sat, 12 Nov 2022 10:52:33 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java line 159: > 157: > 158: if (currentPattern < pattern.length) { > 159: throw new RuntimeException("hs-err file incomplete (first missing pattern: " + pattern[currentPattern] + ")"); Suggestion: throw new RuntimeException("hs-err file incomplete (first missing pattern: " + pattern[currentPattern] + ")"); ------------- PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Sun Nov 13 11:32:46 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 11:32:46 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v2] In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <YtHH6c7C6qbfZgFogPjZbJO0WISUGkNAjRaPG09UaXM=.ec25e9eb-e607-429e-a6f4-c54a1edd662c@github.com> > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java remove blank Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11118/files - new: https://git.openjdk.org/jdk/pull/11118/files/ef8a3da2..52ca0ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Sun Nov 13 11:34:29 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 11:34:29 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <p16kaZRxeYm1rIwKuzmtkNcFrKWnPfo40051MW31QBM=.2478a190-190c-48fd-8f04-7cf5310720e7@github.com> On Thu, 10 Nov 2022 13:23:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. x86 test error unrelated ------------- PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Sun Nov 13 12:13:46 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 12:13:46 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v3] In-Reply-To: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> Message-ID: <GU17f4vx2t1Zgnn_ia5vdsdomquKEb01QcDHt_l6pJQ=.8fa59373-c5d9-41c5-ae84-d287cae40839@github.com> > When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. > > Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. > > Examples: > > NMT, thread stacks, 10000 Threads, default: > > > - Thread (reserved=10332400KB, committed=331828KB) > (thread #10021) > (stack: reserved=10301560KB, committed=300988KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) > > > NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: > > > - Thread (reserved=10332400KB, committed=10284360KB) > (thread #10021) > (stack: reserved=10301560KB, committed=10253520KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: AlwaysPreTouchStacks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10403/files - new: https://git.openjdk.org/jdk/pull/10403/files/9cc43613..322dc77f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=01-02 Stats: 39 lines in 2 files changed: 35 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10403.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10403/head:pull/10403 PR: https://git.openjdk.org/jdk/pull/10403 From redestad at openjdk.org Sun Nov 13 19:54:44 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 19:54:44 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <mTgCRyZ8rmuNMWQfZuyaTrR0u40uNOuPcPL9ChOYqCE=.b1c8e88d-5eaa-4c6d-a22d-efed2cfe678d@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <mTgCRyZ8rmuNMWQfZuyaTrR0u40uNOuPcPL9ChOYqCE=.b1c8e88d-5eaa-4c6d-a22d-efed2cfe678d@github.com> Message-ID: <pM2vZcm-QLZHS74sE03J-do6qrCpDTUBmU4TONgFo5k=.e2cacaa6-efbf-4d9c-9f41-f9d258c59f4c@github.com> On Sat, 12 Nov 2022 15:27:09 GMT, Piotr Tarsa <duke at openjdk.org> wrote: > Out of curiosity: how does this intrinsic affect time-to-safepoint? Does it matter? I don't see any safepoint poll, but then I don't precisely know how safepoints work, so I could be missing something. Theoretically, with 2^31 elements count limit in Java, the whole computation is always a fraction of a second, but maybe it would matter with e.g. ZGC, which could ask for a safepoint while the thread is hashing an array with 2 billion ints. This intrinsic - like several others before it - does not add safepoint checks. There's at least one RFE filed to address this deficiency, and hopefully we can come up with a shared strategy to interleave safepoint checks in the various intrinsics that operate over Strings and arrays: https://bugs.openjdk.org/browse/JDK-8233300 When I brought this up to an internal discussion with @TobiHartmann and @fisk last week several challenges were brought up to the table, including how to deal with all the different contingencies that might be the result of a safepoint, including deoptimization. I think enhancing these intrinsics to poll for safepoints is important to tackle tail-end latencies for extremely latency sensitive applications. In the meantime those applications could (should?) turn off such intrinsics, avoid huge arrays altogether, or both. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From stuefe at openjdk.org Sun Nov 13 20:08:21 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 13 Nov 2022 20:08:21 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address Message-ID: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. ------------- Commit messages: - JDK-8296906-VMError-controlled_crash-crashes-with-wrong-code-and-address Changes: https://git.openjdk.org/jdk/pull/11122/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296906 Stats: 223 lines in 3 files changed: 220 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11122.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11122/head:pull/11122 PR: https://git.openjdk.org/jdk/pull/11122 From kbarrett at openjdk.org Sun Nov 13 20:50:24 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 13 Nov 2022 20:50:24 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> On Fri, 11 Nov 2022 22:41:19 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around to dealing with it. `::snprintf` in the list of candidates for https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been marked. But I don't see new bugs for the as-yet unmarked ones. As a general note, as a reviewer my preference is against non-trivial and persnickety code changes that are scattered all over the code base. For something like this I'd prefer multiple more bite-sized changes that were dealing with specific uses. I doubt everyone agrees with me though. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From kbarrett at openjdk.org Sun Nov 13 20:54:32 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 13 Nov 2022 20:54:32 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks In-Reply-To: <j_nc7lnePGF3rMpnlsERrZjufWAWxTsahtZp1z13WQk=.521e0a62-9c75-426b-ac15-f6b63b4f69da@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <nyPkIJiAmw69cI1CZAlmR2Km0-bVCxXuM3VZR6pVPUs=.b4515771-acf9-4653-ae6c-892d9f508b66@github.com> <j_nc7lnePGF3rMpnlsERrZjufWAWxTsahtZp1z13WQk=.521e0a62-9c75-426b-ac15-f6b63b4f69da@github.com> Message-ID: <Fwg13t4cqJUgK4rpn-89Z61X5HEhWWoyaHWVtApQlgQ=.1581e8d4-e0b4-40d5-afd4-98df7d9d57ae@github.com> On Fri, 23 Sep 2022 13:58:53 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > > Drive-by comment: there is `os::pretouch_memory(void* start, void* end, size_t page_size)` ;) > > Good point. Had to cast the volatile away though. Just happened to notice this going by. Maybe the signature for pretouch_memory should have volatile qualifiers? ------------- PR: https://git.openjdk.org/jdk/pull/10403 From redestad at openjdk.org Sun Nov 13 21:01:08 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 21:01:08 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> Message-ID: <qxFxCcOGasCUw3NBap82yIuqJi-D3p1JY7MSDjejtEU=.5ac1a504-d754-4d2b-95d3-68384834f721@github.com> On Sat, 12 Nov 2022 01:06:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing & 0xff in StringLatin1::hashCode > > src/hotspot/cpu/x86/x86_64.ad line 12073: > >> 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2, rRegI tmp3, rFlagsReg cr) >> 12072: %{ >> 12073: predicate(UseAVX >= 2 && ((VectorizedHashCodeNode*)n)->mode() == VectorizedHashCodeNode::LATIN1); > > If you represent `VectorizedHashCodeNode::mode()` as an input, it would allow to abstract over supported modes and come up with a single AD instruction. Take a look at `VectorMaskCmp` for an example (not a perfect one though since it has both _predicate member and constant input which is redundant). Thanks for the pointer, I'll check it out! ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sun Nov 13 21:01:09 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 21:01:09 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <corxcBxj97U75Q-5Eh1CfvjkyM8L8w7F1oduaYyf11g=.ca02efb8-76b0-4afe-af25-2c3aa6cd8b4b@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> <corxcBxj97U75Q-5Eh1CfvjkyM8L8w7F1oduaYyf11g=.ca02efb8-76b0-4afe-af25-2c3aa6cd8b4b@github.com> Message-ID: <oSWFF_vJRWorkI2hn_t8oTt771UhqKbLYe0bdnrXHuI=.268d7735-b07a-4c62-8209-6093a5e17a8b@github.com> On Sat, 12 Nov 2022 01:10:50 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/cpu/x86/x86_64.ad line 12081: >> >>> 12079: format %{ "Array HashCode byte[] $ary1,$cnt1 -> $result // KILL all" %} >>> 12080: ins_encode %{ >>> 12081: __ arrays_hashcode($ary1$$Register, $cnt1$$Register, $result$$Register, >> >> What's the motivation to keep the stub code inlined instead of calling into a stand-alone pre-generated version of the stub? > > Also, switching to stand-alone stubs would enable us to compose a generic stub version (as we do in `StubGenerator::generate_generic_copy()` for arraycopy stubs). But it would be even better to do the dispatching on JDK side and always pass a constant into the intrinsic. There are no single reason this code evolved the way it did. @luhenry worked on it initially and was guided towards intrinsifying what was originally a JDK-level unrolling. Then I took over and have tried to find a path of least resistance from there. @luhenry have discussed rewriting part or all of this as a stub, for various reasons. I've been scoping that out, but with no experience writing stub versions I figured perhaps this could be done in a follow-up. If you think there's a compelling enough reason to rewrite this as a stub up front I can try and find the time to do so. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sun Nov 13 21:03:26 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 21:03:26 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> Message-ID: <a7v9TcVjwr5esqqk7zOuMHmnoIv1vIqBoAFnhn4YhoU=.0907e089-2789-492f-9210-1ba3abbf00a3@github.com> On Sat, 12 Nov 2022 01:28:51 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing & 0xff in StringLatin1::hashCode > > src/hotspot/share/opto/intrinsicnode.hpp line 175: > >> 173: // as well as adjusting for special treatment of various encoding of String >> 174: // arrays. Must correspond to declared constants in jdk.internal.util.ArraysSupport >> 175: typedef enum HashModes { LATIN1 = 0, UTF16 = 1, BYTE = 2, CHAR = 3, SHORT = 4, INT = 5 } HashMode; > > I question the need for `LATIN1` and `UTF16` modes. If you lift some of input adjustments (initial value and input size) into JDK, it becomes indistinguishable from `BYTE`/`CHAR`. Then you can reuse existing constants for basic types. UTF16 can easily be replaced with CHAR by lifting up the shift as you say, but LATIN1 needs to be distinguished from BYTE since the former needs unsigned semantics. Modeling in a signed/unsigned input is possible, but I figured we might as well call it UNSIGNED_BYTE and decouple it logically from String::LATIN1. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sun Nov 13 21:07:38 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 21:07:38 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> Message-ID: <TeRTnFrEqjkHPNPdxk3AAGCzryAXMuucxA-W0ZdKg_8=.bed2b3f2-37e6-42c9-9e02-55dd85d41e7f@github.com> On Sat, 12 Nov 2022 01:35:39 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Missing & 0xff in StringLatin1::hashCode > > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 185: > >> 183: */ >> 184: @IntrinsicCandidate >> 185: public static int vectorizedHashCode(Object array, byte mode) { > > The intrinsic can be generalized by: > 1. expanding `array` input into `base`, `offset`, and `length`. It will make it applicable to any type of data source (on-heap/off-heap `ByteBuffer`s, `MemorySegment`s. > 2. passing initial value as a parameter. > > Basically, hash code computation can be represented as a reduction: `reduce(initial_val, (acc, v) -> 31 * acc + v, data)`. You hardcode the operation, but can make the rest variable. > > (Even the operation can be slightly generalized if you make 31 variable and then precompute the table at runtime. But right now I don't see much value in investing into that.) I've been thinking of generalizing as thus as a possible follow-up: get the base operation on entire arrays in, then generalize carefully while ensuring that doesn't add too much complexity, introduce unforeseen overheads etc. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sun Nov 13 21:12:21 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 13 Nov 2022 21:12:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> Message-ID: <mAsTs03BWlAM_YhHXHBNsaJIQqrX95deQyYAYkJwrmE=.7e36f69c-dffc-462a-bf8a-7614153bf8c4@github.com> On Sat, 12 Nov 2022 02:08:19 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > Also, I'd like to note that C2 auto-vectorization support is not too far away from being able to optimize hash code computations. At some point, I was able to achieve some promising results with modest tweaking of SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ Intriguing. How far off is this - and do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? If we turn this intrinsic into a stub we might also be able to reuse the optimization in other places, including from within the VM (calculating String hashCodes happen in a couple of places, including String deduplication). So I think there are still a few compelling reasons to go the manual route and continue on this path. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From xuelei at openjdk.org Sun Nov 13 22:55:30 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Sun, 13 Nov 2022 22:55:30 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v2] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <nd7HSOhS0hdjnZWztbIPSm_jx2TdYZdWsgzmSGHsYzg=.d522c721-1233-43e8-be5e-9f4fbfb09ac2@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: - use os::snprintf for desktop update - use os::snprintf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/e4724c5f..a66f58bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=00-01 Stats: 41 lines in 18 files changed: 0 ins; 0 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From dholmes at openjdk.org Sun Nov 13 22:56:20 2022 From: dholmes at openjdk.org (David Holmes) Date: Sun, 13 Nov 2022 22:56:20 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <J9AH8qxXd0iRgx13p-Eo5OxeoZWINDBGTsD41ijcZTo=.7259fe6e-ae46-45d2-8265-2c2b8d9e161b@github.com> References: <mgQHdsI_oHeWVQEOQNQLrfplcvatEauNPfq1rEswJF4=.cc842c02-e7e0-4f72-95a0-1033ce101cfe@github.com> <TlQR1R0Jt3DqqMWNwZzUyRpSs7lqI0Cig2zpUnmYI3s=.e3add258-443b-4adb-9b31-9f9a76042ff4@github.com> <NDsoMk5BjB0oLGW6pQagOrm-CWrPjc-wwfRhA3vJt6g=.10119bb3-2d8b-4d53-bfaa-9f7a01dabfd7@github.com> <TCLX4fpqVeou-wQDj1SBil2xIyIzk2NFcj7pPpz_xjs=.4152872e-18aa-426b-b967-68118f7ba62d@github.com> <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> <J9AH8qxXd0iRgx13p-Eo5OxeoZWINDBGTsD41ijcZTo=.7259fe6e-ae46-45d2-8265-2c2b8d9e161b@github.com> Message-ID: <dAHJOQEW2lOjuRlBvrars4ojYB0vLTag-Pq2_Zo26D8=.c273f2bb-3224-46ba-8c35-577dcce64d24@github.com> On Fri, 11 Nov 2022 14:35:22 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >>> So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. >> @rose00 why only adjacently? Nested locking can be interleaved on different monitors. > > @dholmes-ora and all: I have prepared an alternative PR #10907 that implements the fast-locking behind a new experimental flag, and preserves the current stack-locking behavior as the default setting. It is currently implemented and tested on x86* and aarch64 arches. It is also less invasive because it keeps everything structurally the same (i.e. no method signature changes, no stack layout changes, etc). On the downside, it also means we can not have any of the associated cleanups and optimizations yet, but those are minor anyway. Also, there still is the risk that I make a mistake with the necessary factoring-out of current implementation. If we agree that this should be the way to go, then I would close this PR, and continue work on #10907. @rkennke not unexpectedly I greatly prefer the optional and opt-in version in PR https://github.com/openjdk/jdk/pull/10907. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From xuelei at openjdk.org Sun Nov 13 22:58:24 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Sun, 13 Nov 2022 22:58:24 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> Message-ID: <-ER02i5fNCaZX-v56giR7gCbuMkwwpdhg5LBLThVvds=.402ad601-58e3-4529-935a-151e5e76b2b3@github.com> On Sun, 13 Nov 2022 20:48:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. Updated to use os::snprintf, except the files under adlc where the os::snptintf definition is not included. The use of snprintf could be cleaned up with existing code in the future. > > Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. > > I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around to dealing with it. `::snprintf` in the list of candidates for https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been marked. But I don't see new bugs for the as-yet unmarked ones. > > As a general note, as a reviewer my preference is against non-trivial and persnickety code changes that are scattered all over the code base. For something like this I'd prefer multiple more bite-sized changes that were dealing with specific uses. I doubt everyone agrees with me though. It makes sense to me. I'd better focus on the building issue in this PR. Thank you for the review! ------------- PR: https://git.openjdk.org/jdk/pull/11115 From dholmes at openjdk.org Sun Nov 13 23:29:06 2022 From: dholmes at openjdk.org (David Holmes) Date: Sun, 13 Nov 2022 23:29:06 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features In-Reply-To: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> Message-ID: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> On Thu, 10 Nov 2022 06:20:41 GMT, Julian Waters <jwaters at openjdk.org> wrote: > After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 > > No changes to the behaviour of the JDK has resulted in any way from this commit This looks good in general. It is a pity there is so much simple moving of where "attributes" are listed, as it makes it look like the changes are more extensive than they really are - that said I prefer to see the attributes appear before a function/method signature rather than after, or somewhere in-between. A few other comments below. Thanks. make/autoconf/flags-cflags.m4 line 632: > 630: if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then > 631: STATIC_LIBS_CFLAGS="$STATIC_LIBS_CFLAGS -ffunction-sections -fdata-sections \ > 632: -DJNIEXPORT='[[gnu::visibility(\"hidden\")]]'" So IIUC we now use attributes via the C++11 syntax rather than compiler-specific syntax - even where the C++11 syntax is referring to a compiler specific attribute. Is that right? src/hotspot/os/linux/os_perf_linux.cpp line 233: > 231: * Ensure that 'fmt' does _NOT_ contain the first two "%d %s" > 232: */ > 233: SCANF_ARGS(2, 0) static int vread_statdata(const char* procfile, _SCANFMT_ const char* fmt, va_list args) { If `SCANF_ARGS` can/must come first then I suggest adding a newline after it so the method signature is easier to spot. Applied everywhere of course. src/hotspot/os/windows/os_windows.hpp line 35: > 33: class Thread; > 34: > 35: static unsigned __stdcall thread_native_entry(Thread*); Why was this removed? This is needed to correctly specify the call sequence for the thread entry routine when used with `_beginThreadex`: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 src/hotspot/share/cds/filemap.hpp line 482: > 480: > 481: // Errors. > 482: ATTRIBUTE_PRINTF(1, 2) static void fail_stop(const char *msg, ...); Again I suggest a newline after `ATTRIBUTE_PRINTF` src/hotspot/share/utilities/compilerWarnings.hpp line 47: > 45: #endif > 46: > 47: #ifndef PRAGMA_DISABLE_VISCPP_WARNING Why rename this from `MSVC` to `VISCPP`? IIRC the full name is Microsft Visual Studio C++, so you new name is not obviously better and the change just adds noise to the PR. Further `MSVC` matches what MS themselves use and even the attribute namespace in C++11 is `MSVC`. Update: I see the inconsistency with `compilerWarnings_visCPP.hpp` src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 37: > 35: #endif > 36: > 37: #if defined(__clang_major__) && \ Not clear why this was moved ?? src/hotspot/share/utilities/debug.hpp line 172: > 170: void report_fatal(VMErrorType error_type, const char* file, int line, const char* detail_fmt, ...) ATTRIBUTE_PRINTF(4, 5); > 171: void report_vm_out_of_memory(const char* file, int line, size_t size, VMErrorType vm_err_type, > 172: const char* detail_fmt, ...) ATTRIBUTE_PRINTF(5, 6); Why were the ATTRIBUTE_PRINTFs removed? ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 00:16:16 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 00:16:16 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> Message-ID: <Srzk-lLV5JsKV7KGHIMbRbON36doqa2xXVJ-2KrCEpo=.7501c608-51c5-439e-844f-83151ad87989@github.com> On Fri, 11 Nov 2022 08:47:39 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Mark constructors explicit > When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. The default initial capacity is 2, the explicit initial capacities are not 2 and in many case >>>2. So without an explicit capacity passed in these GrowableArrays would likely waste a lot of time unnecessarily growing. I don't think either of these parameters should really have a default value - in which case the order could have remained as it was. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From jwaters at openjdk.org Mon Nov 14 01:22:47 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 01:22:47 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features In-Reply-To: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> Message-ID: <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> On Sun, 13 Nov 2022 22:59:01 GMT, David Holmes <dholmes at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > src/hotspot/os/windows/os_windows.hpp line 35: > >> 33: class Thread; >> 34: >> 35: static unsigned __stdcall thread_native_entry(Thread*); > > Why was this removed? This is needed to correctly specify the call sequence for the thread entry routine when used with `_beginThreadex`: > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 I'm not sure I follow, I didn't remove anything here? > src/hotspot/share/utilities/compilerWarnings.hpp line 47: > >> 45: #endif >> 46: >> 47: #ifndef PRAGMA_DISABLE_VISCPP_WARNING > > Why rename this from `MSVC` to `VISCPP`? IIRC the full name is Microsft Visual Studio C++, so you new name is not obviously better and the change just adds noise to the PR. Further `MSVC` matches what MS themselves use and even the attribute namespace in C++11 is `MSVC`. > Update: I see the inconsistency with `compilerWarnings_visCPP.hpp` Yep, it was renamed since the file is also named VISCPP, and I felt that matching the names was a good style change ------------- PR: https://git.openjdk.org/jdk/pull/11081 From jwaters at openjdk.org Mon Nov 14 01:38:18 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 01:38:18 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features In-Reply-To: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> Message-ID: <jxCazhUqSAhkx-jLo3Y1rBwdhQmzGYKEzUzIKRDbZms=.117eb027-ed31-422a-af60-2b40f3ae1cce@github.com> On Sun, 13 Nov 2022 23:07:35 GMT, David Holmes <dholmes at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 37: > >> 35: #endif >> 36: >> 37: #if defined(__clang_major__) && \ > > Not clear why this was moved ?? I'm not sure which one you're referring to, but the PRAGMA_DIAG_PUSH/POP was moved up to the top of the header to match compilerWarnings_visCPP.hpp, and PRAGMA_DISABLE_GCC_WARNING_AUX was moved to macros.hpp as the more general PRAGMA macro, since it's useful for all compilers and not just gcc ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 01:39:31 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 01:39:31 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <0eSjQ4TmTMudluqBAowiwXl4hpgwDXvQYEy9gRsKTEo=.82c8ca83-9a1a-458f-9909-1531c484f7fb@github.com> On Thu, 10 Nov 2022 13:23:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. This looks good for doing what it says, but I have to wonder whether it is actually worthwhile doing this unless most OS/lib will support it? What will the implementation be in AIX? Thanks src/hotspot/share/utilities/globalDefinitions.hpp line 376: > 374: > 375: #define PROPERFMT SIZE_FORMAT "%s" > 376: #define PROPERFMTARGS(S) byte_size_in_proper_unit(S), proper_unit_for_byte_size(S) style nit? lower-case 's' ------------- PR: https://git.openjdk.org/jdk/pull/11089 From jwaters at openjdk.org Mon Nov 14 01:41:24 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 01:41:24 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features In-Reply-To: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> Message-ID: <JifpzUjotcu252D3-usFaP6YYOm26UkGxNigCl8TyS4=.ad93c4a9-b88d-40f5-adc4-461e78311a5b@github.com> On Sun, 13 Nov 2022 23:08:53 GMT, David Holmes <dholmes at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > src/hotspot/share/utilities/debug.hpp line 172: > >> 170: void report_fatal(VMErrorType error_type, const char* file, int line, const char* detail_fmt, ...) ATTRIBUTE_PRINTF(4, 5); >> 171: void report_vm_out_of_memory(const char* file, int line, size_t size, VMErrorType vm_err_type, >> 172: const char* detail_fmt, ...) ATTRIBUTE_PRINTF(5, 6); > > Why were the ATTRIBUTE_PRINTFs removed? The ATTRIBUTE_PRINTF macros are still there, just moved in front of the methods ------------- PR: https://git.openjdk.org/jdk/pull/11081 From jwaters at openjdk.org Mon Nov 14 01:46:16 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 01:46:16 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features In-Reply-To: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> Message-ID: <cvaF0K8j70TF2vlmgBHzQYAOiYSeP7uZ_Y4yavC0J2w=.e69dd78a-7b52-4a00-969e-e4a195623c2b@github.com> On Sun, 13 Nov 2022 23:16:47 GMT, David Holmes <dholmes at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > make/autoconf/flags-cflags.m4 line 632: > >> 630: if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then >> 631: STATIC_LIBS_CFLAGS="$STATIC_LIBS_CFLAGS -ffunction-sections -fdata-sections \ >> 632: -DJNIEXPORT='[[gnu::visibility(\"hidden\")]]'" > > So IIUC we now use attributes via the C++11 syntax rather than compiler-specific syntax - even where the C++11 syntax is referring to a compiler specific attribute. Is that right? Yep, just something that C++ does a little neater, at least in my view ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 01:57:23 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 01:57:23 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v2] In-Reply-To: <nd7HSOhS0hdjnZWztbIPSm_jx2TdYZdWsgzmSGHsYzg=.d522c721-1233-43e8-be5e-9f4fbfb09ac2@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <nd7HSOhS0hdjnZWztbIPSm_jx2TdYZdWsgzmSGHsYzg=.d522c721-1233-43e8-be5e-9f4fbfb09ac2@github.com> Message-ID: <Di-INP4TYouTaezufCIe4hYyclEwJkG-8NuA7LXad8M=.d94d0b86-3415-4383-b3db-80468ec01d44@github.com> On Sun, 13 Nov 2022 22:55:30 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: > > - use os::snprintf for desktop update > - use os::snprintf The hotspot changes seem okay using os::snprint. The adlc changes to use raw snprintf also seem okay for now - I'm not sure whether the platform differences for snprintf affect adlc. The desktop change is wrong - you can't use os::snprintf there. Thanks. src/java.desktop/macosx/native/libjsound/PLATFORM_API_MacOSX_Ports.cpp line 638: > 636: return; > 637: } > 638: os::snprintf(channelName, 16, "Ch %d", ch); You can't use this here - this is not hotspot code! ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From dholmes at openjdk.org Mon Nov 14 02:13:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 02:13:24 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <nIA9wC-xmdZZ-ifksYR9KLG85-H-GeSR3Nc73Kg9a4U=.75c6e156-d9ea-4e9f-abb1-d99833c04f1a@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. This seems very disruptive and I'm not sure there is any real gain. From a practical view how does this potentially impact the ability to do clean backports? I don't agree with the changed treatment of `share/include` and I think it looks very strange to see `#include "include/xxx.h"` I agree with Kim it would be better to break this up into different issues so that we don't roadblock on "all or nothing". ------------- PR: https://git.openjdk.org/jdk/pull/11108 From fyang at openjdk.org Mon Nov 14 02:27:02 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 14 Nov 2022 02:27:02 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file Message-ID: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. These are small functions which mostly contain only a single line of code. We should move them to the corresponding header file so that they have a chance to be inlined. Testing: Tier1 on linux-riscv64 HiFive unmatched board. ------------- Commit messages: - 8296916: RISC-V: Move some small macro-assembler functions to header file Changes: https://git.openjdk.org/jdk/pull/11130/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11130&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296916 Stats: 281 lines in 3 files changed: 100 ins; 140 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/11130.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11130/head:pull/11130 PR: https://git.openjdk.org/jdk/pull/11130 From dholmes at openjdk.org Mon Nov 14 02:30:23 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 02:30:23 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> Message-ID: <jzozfLUjKaaAzynbRsCnxgIOb437X_3Z6WHQ-aOgHGU=.06463d2f-0520-4b78-bc33-1ac208a5e8a8@github.com> On Sat, 12 Nov 2022 07:05:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - reduce unnecessary diffs > - explicit constructor for fake callstacks; revert default ctor This doesn't seem unreasonable, though I can't comment on the details of what the compiler may or may not, do here. A couple of nits below. If @iklam (or others) is okay with this I will also approve. Thanks. src/hotspot/share/utilities/nativeCallStack.hpp line 65: > 63: enum class FakeMarker { its_fake }; > 64: #ifdef ASSERT > 65: static constexpr uintptr_t _fake_address = Why not type this as `address` and save later casts? src/hotspot/share/utilities/nativeCallStack.hpp line 67: > 65: static constexpr uintptr_t _fake_address = > 66: (LP64_ONLY(0x4E4D54535441434BULL) // "NMTSTACK" > 67: NOT_LP64(0x4E4D5453)); // "NMTS" Why are the outer parentheses needed? src/hotspot/share/utilities/nativeCallStack.hpp line 79: > 77: explicit NativeCallStack(FakeMarker dummy) { > 78: #ifdef ASSERT > 79: for (int i = 0; i < NMT_TrackingStackDepth; i ++) { Nit: no space before `++` (spaces go around binary operators) ------------- PR: https://git.openjdk.org/jdk/pull/11040 From fjiang at openjdk.org Mon Nov 14 02:41:13 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 14 Nov 2022 02:41:13 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file In-Reply-To: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> Message-ID: <KRRecEfZa1i0BVnTp2us-m5PaNZ1JAZ7QVTpCxr045s=.31c0ffc7-0a67-41e6-b7b3-3aef6cc2cfde@github.com> On Mon, 14 Nov 2022 02:19:30 GMT, Fei Yang <fyang at openjdk.org> wrote: > Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. > These are small functions which mostly contain only a single line of code. We should move them to the > corresponding header file so that they have a chance to be inlined. > > Testing: Tier1 on linux-riscv64 HiFive unmatched board. src/hotspot/cpu/riscv/assembler_riscv.hpp line 2757: > 2755: // Bit-manipulation extension pseudo instructions > 2756: // zero extend word > 2757: void zext_w(Register Rd, Register Rs) { Seems we can move `zext_w` to MacroAssembler too. ------------- PR: https://git.openjdk.org/jdk/pull/11130 From dholmes at openjdk.org Mon Nov 14 02:51:13 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 02:51:13 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> On Sun, 13 Nov 2022 09:01:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. Did we not notice or was that exactly what was expected? When JDK-8065895 added that code it was known to generate SEGV _because_ it was outside the allowable range (otherwise there would have been no guarantee). Maybe the SI_KERNEL behaviour has changed since then? It doesn't seem an issue to change it to a low address (other than that doesn't work on AIX) but it seems odd to now consider it a bug - seems more like you now consider it too limited because the true address is not given in the sig info? ------------- PR: https://git.openjdk.org/jdk/pull/11122 From fyang at openjdk.org Mon Nov 14 02:58:20 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 14 Nov 2022 02:58:20 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> Message-ID: <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> > Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. > These are small functions which mostly contain only a single line of code. We should move them to the > corresponding header file so that they have a chance to be inlined. > > Testing: Tier1 on linux-riscv64 HiFive unmatched board. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11130/files - new: https://git.openjdk.org/jdk/pull/11130/files/47365eb2..363966e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11130&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11130&range=00-01 Stats: 12 lines in 2 files changed: 6 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11130.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11130/head:pull/11130 PR: https://git.openjdk.org/jdk/pull/11130 From fyang at openjdk.org Mon Nov 14 02:58:22 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 14 Nov 2022 02:58:22 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <KRRecEfZa1i0BVnTp2us-m5PaNZ1JAZ7QVTpCxr045s=.31c0ffc7-0a67-41e6-b7b3-3aef6cc2cfde@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <KRRecEfZa1i0BVnTp2us-m5PaNZ1JAZ7QVTpCxr045s=.31c0ffc7-0a67-41e6-b7b3-3aef6cc2cfde@github.com> Message-ID: <tkHVVMJh4EH8R1XJBoUXfJqRP_T3RbNb7lCMBdFA_3s=.60bd6790-37e1-4779-9cc8-d2e1638acaa0@github.com> On Mon, 14 Nov 2022 02:34:27 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 2757: > >> 2755: // Bit-manipulation extension pseudo instructions >> 2756: // zero extend word >> 2757: void zext_w(Register Rd, Register Rs) { > > Seems we can move `zext_w` to MacroAssembler too. Done. But the caller should be aware that this actually uses instructions from the Bit-manipulation extension. I think the code comment should make this explicit. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11130 From dholmes at openjdk.org Mon Nov 14 03:00:41 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 03:00:41 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <AgOwBUgyeybR80EmKDvyqrkdSVJJk__cwt9KA-8FRFM=.37354650-98a0-4529-a872-143d8bd2f84d@github.com> On Sun, 13 Nov 2022 09:01:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. This seems okay in principle but some issues with the test I think. Comments below. Thanks. test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 1: > 1: import jdk.test.lib.process.OutputAnalyzer; We need a copyright and license header here please. test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 15: > 13: > 14: // extract hs-err file > 15: String hs_err_file = output.firstMatch("# *(\\S*hs_err_pid\\d+\\.log)", 1); I'm not sure this is going to be useful in the way you are trying to use it. This will show the original path to the hs_err file at the time it is created. But jtreg can move things around in the final test result output and place the hs_err file somewhere else. test/hotspot/jtreg/runtime/ErrorHandling/TestSigInfoInHsErrFile.java line 2: > 1: /* > 2: * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved. New file should only have 2022 copyright year. test/hotspot/jtreg/runtime/ErrorHandling/TestSigInfoInHsErrFile.java line 54: > 52: import java.io.File; > 53: import java.io.FileInputStream; > 54: import java.io.InputStreamReader; Some of these includes seem unnecessary with the utility class you added. test/hotspot/jtreg/runtime/ErrorHandling/TestSigInfoInHsErrFile.java line 130: > 128: patterns.add(Pattern.compile("siginfo: si_signo: \\d+ \\(SIGSEGV\\), si_code: \\d+ \\(SEGV_MAPERR\\), si_addr: 0x0*400")); > 129: } else { > 130: patterns.add(Pattern.compile("siginfo: si_signo: \\d+ \\(SIGSEGV\\).*")); Why not use the AIX 5K address here? ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11122 From fjiang at openjdk.org Mon Nov 14 03:01:43 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 14 Nov 2022 03:01:43 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> Message-ID: <cUd4rSzfjzK8ol3rCs1fRMf-Lctmtoau_1yNS3NIyyM=.ed3a65d4-8c85-48f7-9810-e6d5a218e944@github.com> On Mon, 14 Nov 2022 02:58:20 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. >> These are small functions which mostly contain only a single line of code. We should move them to the >> corresponding header file so that they have a chance to be inlined. >> >> Testing: Tier1 on linux-riscv64 HiFive unmatched board. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review Change looks good. ------------- Marked as reviewed by fjiang (Author). PR: https://git.openjdk.org/jdk/pull/11130 From xuelei at openjdk.org Mon Nov 14 03:27:05 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 03:27:05 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v3] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <rm_c56ZppyLR9pXr-reIf9nHLx49S9XqfsJ04FPkevU=.778ae4bf-0d55-41d3-913d-d006f2def45b@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: revert update for desktop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/a66f58bf..fe6893d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Mon Nov 14 03:27:09 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 03:27:09 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v2] In-Reply-To: <Di-INP4TYouTaezufCIe4hYyclEwJkG-8NuA7LXad8M=.d94d0b86-3415-4383-b3db-80468ec01d44@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <nd7HSOhS0hdjnZWztbIPSm_jx2TdYZdWsgzmSGHsYzg=.d522c721-1233-43e8-be5e-9f4fbfb09ac2@github.com> <Di-INP4TYouTaezufCIe4hYyclEwJkG-8NuA7LXad8M=.d94d0b86-3415-4383-b3db-80468ec01d44@github.com> Message-ID: <lr_LEH-mWQ0qFPhw6zKAGyQi_rFxq-FbAsd_g4QQwd0=.30e8bc0d-aab1-44a5-b7db-3bde04e8a75a@github.com> On Mon, 14 Nov 2022 01:51:32 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with two additional commits since the last revision: >> >> - use os::snprintf for desktop update >> - use os::snprintf > > src/java.desktop/macosx/native/libjsound/PLATFORM_API_MacOSX_Ports.cpp line 638: > >> 636: return; >> 637: } >> 638: os::snprintf(channelName, 16, "Ch %d", ch); > > You can't use this here - this is not hotspot code! You are right. Reverted to use `snprintf` for desktop update. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Mon Nov 14 03:46:37 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 03:46:37 GMT Subject: Withdrawn: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <FutHL-ibq6jTikJTqSl0CqIxWvUKg9KwaqWRNn5yYF4=.3256ed91-6b91-46f8-b72c-becc9db3d620@github.com> On Fri, 11 Nov 2022 22:41:19 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From iklam at openjdk.org Mon Nov 14 03:48:05 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 14 Nov 2022 03:48:05 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> Message-ID: <hpoQpgk-wHVvTD_0YOP2GSrlAvU3_7rmz9LJV8RZ7OQ=.87613a9a-8dea-49c5-a512-4b13c8086642@github.com> On Sat, 12 Nov 2022 07:05:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - reduce unnecessary diffs > - explicit constructor for fake callstacks; revert default ctor src/hotspot/share/utilities/nativeCallStack.hpp line 67: > 65: static constexpr uintptr_t _fake_address = > 66: (LP64_ONLY(0x4E4D54535441434BULL) // "NMTSTACK" > 67: NOT_LP64(0x4E4D5453)); // "NMTS" There's no guarantee that these addresses will never be a valid return address, so you may get suprious assertion failures. Since this is used only in debug mode for assertion, I think it's much better to have an extra field `bool _is_fake`. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From jwaters at openjdk.org Mon Nov 14 04:14:24 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 04:14:24 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> Message-ID: <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> > After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 > > No changes to the behaviour of the JDK has resulted in any way from this commit Julian Waters has updated the pull request incrementally with one additional commit since the last revision: ATTRIBUTE_SCANF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11081/files - new: https://git.openjdk.org/jdk/pull/11081/files/6d85c432..bb3ef0dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11081&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11081&range=00-01 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11081.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11081/head:pull/11081 PR: https://git.openjdk.org/jdk/pull/11081 From jwaters at openjdk.org Mon Nov 14 04:14:26 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 04:14:26 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> Message-ID: <xa0OxJAKuXs1pITkUZKcljgKl00lHRp9XUdMrknO9J4=.6aa6f913-0739-4bf8-a05a-bd77a7e8bfdd@github.com> On Sun, 13 Nov 2022 22:58:11 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> ATTRIBUTE_SCANF > > src/hotspot/os/linux/os_perf_linux.cpp line 233: > >> 231: * Ensure that 'fmt' does _NOT_ contain the first two "%d %s" >> 232: */ >> 233: SCANF_ARGS(2, 0) static int vread_statdata(const char* procfile, _SCANFMT_ const char* fmt, va_list args) { > > If `SCANF_ARGS` can/must come first then I suggest adding a newline after it so the method signature is easier to spot. Applied everywhere of course. Resolved, thanks ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 05:18:23 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 05:18:23 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v2] In-Reply-To: <YtHH6c7C6qbfZgFogPjZbJO0WISUGkNAjRaPG09UaXM=.ec25e9eb-e607-429e-a6f4-c54a1edd662c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <YtHH6c7C6qbfZgFogPjZbJO0WISUGkNAjRaPG09UaXM=.ec25e9eb-e607-429e-a6f4-c54a1edd662c@github.com> Message-ID: <ZvxcBQflZf4qxygcXOsdL13zBeyww-V0lY-otI69kS4=.f2145e5f-8a38-4cd1-9ed2-a364d01b8128@github.com> On Sun, 13 Nov 2022 11:32:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java > > remove blank > > Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> Seems reasonable in terms of the code, but I am curious how you expect this to be used in practice? If you get a secondary crash with this turned off then I would expect to use gdb on the core file to get the secondary stack, rather than using the new flag and hoping to reproduce the problem. To be useful it seems you would have to remember to run with this always enabled when developing/debugging. ?? src/hotspot/share/utilities/vmError.cpp line 1631: > 1629: os::infinite_sleep(); > 1630: } else { > 1631: // A secondary error happened. Print a much abridged information, but take care, since crashing Suggestion: "Print brief information, but ..." test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java line 72: > 70: with_callstacks = false; > 71: } else { > 72: throw new IllegalArgumentException("unknown argument"); Nit: include the unknown argument in the message ------------- PR: https://git.openjdk.org/jdk/pull/11118 From xuelei at openjdk.org Mon Nov 14 05:32:20 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 05:32:20 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: include missing os head file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/fe6893d5..128bc806 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From thartmann at openjdk.org Mon Nov 14 06:16:33 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Nov 2022 06:16:33 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <KnTiK779kROmirKS9GkIYTO1zTQF5rc9GShXEIQ0fEw=.2b58eaf3-dc1d-4053-9b84-c86a770dbf91@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode For the record, we have [JDK-8233300](https://bugs.openjdk.org/browse/JDK-8233300) to investigate safepoint-aware intrinsics. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From stuefe at openjdk.org Mon Nov 14 06:44:23 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 06:44:23 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> Message-ID: <W2dP9QHYcpLsLoHyl6hygw6wDOxtbW-otb1XWPSv0QU=.7c72ac80-dc9f-4a75-b14a-5cd9a870c988@github.com> On Mon, 14 Nov 2022 02:49:01 GMT, David Holmes <dholmes at openjdk.org> wrote: > Did we not notice or was that exactly what was expected? When JDK-8065895 added that code it was known to generate SEGV _because_ it was outside the allowable range (otherwise there would have been no guarantee). Maybe the SI_KERNEL behaviour has changed since then? It doesn't seem an issue to change it to a low address (other than that doesn't work on AIX) but it seems odd to now consider it a bug - seems more like you now consider it too limited because the true address is not given in the sig info? My intent is to have a regression test that clearly shows the expected crash address and si_code in the hs-err file. Especially since we now have more developers changing error reporting. Regression tests give you the safety to try new approaches without having to manually check for problems. > test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 15: > >> 13: >> 14: // extract hs-err file >> 15: String hs_err_file = output.firstMatch("# *(\\S*hs_err_pid\\d+\\.log)", 1); > > I'm not sure this is going to be useful in the way you are trying to use it. This will show the original path to the hs_err file at the time it is created. But jtreg can move things around in the final test result output and place the hs_err file somewhere else. We use this pattern in a number of places, e.g. in BadNativeStackInErrorHandlingTest and SafeFetchInErrorHandlingTest.java. Seems to work ok so far. > test/hotspot/jtreg/runtime/ErrorHandling/TestSigInfoInHsErrFile.java line 130: > >> 128: patterns.add(Pattern.compile("siginfo: si_signo: \\d+ \\(SIGSEGV\\), si_code: \\d+ \\(SEGV_MAPERR\\), si_addr: 0x0*400")); >> 129: } else { >> 130: patterns.add(Pattern.compile("siginfo: si_signo: \\d+ \\(SIGSEGV\\).*")); > > Why not use the AIX 5K address here? Oversight. I have no way to test AIX anymore since we handed porting over to IBM. ------------- PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Mon Nov 14 07:04:32 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:04:32 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v2] In-Reply-To: <ZvxcBQflZf4qxygcXOsdL13zBeyww-V0lY-otI69kS4=.f2145e5f-8a38-4cd1-9ed2-a364d01b8128@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <YtHH6c7C6qbfZgFogPjZbJO0WISUGkNAjRaPG09UaXM=.ec25e9eb-e607-429e-a6f4-c54a1edd662c@github.com> <ZvxcBQflZf4qxygcXOsdL13zBeyww-V0lY-otI69kS4=.f2145e5f-8a38-4cd1-9ed2-a364d01b8128@github.com> Message-ID: <7bHuyyWSn-IWFNJ6J_JFFWe93DUHdbirP5aDn-tVAqg=.fa0b8dee-01ce-4cb2-adb9-4217d87ea9c9@github.com> On Mon, 14 Nov 2022 05:16:05 GMT, David Holmes <dholmes at openjdk.org> wrote: > Seems reasonable in terms of the code, but I am curious how you expect this to be used in practice? If you get a secondary crash with this turned off then I would expect to use gdb on the core file to get the secondary stack, rather than using the new flag and hoping to reproduce the problem. To be useful it seems you would have to remember to run with this always enabled when developing/debugging. ?? Yes, sure. It is easier than getting the code and starting gdb, especially if you have no access to the core. If the secondary error is reproducible, and they often are, and the developer's goal is to fix them, he sets the option. Even if they are not reproducible - if people get annoyed enough, we can just enable it for tests where they regularly happen. The option does not cost anything in terms of normal runtime. We also could always enable it in runtime/ErrorHandling tests. At SAP, we even have test specifically for sniffing out and fixing secondary crashes, but since they are work-intensive in practice we never contributed them. Not sure if you followed https://github.com/openjdk/jdk/pull/11017. Here and in other RFEs we argued about granularity of error-reporting STEPs with their seconary-signal-catch capability. There were voices for making them more fine-grained. But that would cause other problems, and therefore it would be good to harden out the error reporting code instead. This RFE tries to make this work a little less onerous. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Mon Nov 14 07:10:39 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:10:39 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v2] In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <925asUQVBWNrlwBiUMVJLWMFpPsKeAG40UYd8hI90pc=.364ae2b4-f7fb-437b-92d3-6825cb5e8879@github.com> > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback david ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11122/files - new: https://git.openjdk.org/jdk/pull/11122/files/8c67aa70..33b758ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=00-01 Stats: 36 lines in 2 files changed: 25 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11122.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11122/head:pull/11122 PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Mon Nov 14 07:27:17 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:27:17 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming In-Reply-To: <0eSjQ4TmTMudluqBAowiwXl4hpgwDXvQYEy9gRsKTEo=.82c8ca83-9a1a-458f-9909-1531c484f7fb@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <0eSjQ4TmTMudluqBAowiwXl4hpgwDXvQYEy9gRsKTEo=.82c8ca83-9a1a-458f-9909-1531c484f7fb@github.com> Message-ID: <wCFaJsuA9UKuo0DbWfmzwTfmzfOBw3iuDCnW45aJYPc=.c5c2c25f-14be-4890-8938-e288259a74ae@github.com> On Mon, 14 Nov 2022 01:35:35 GMT, David Holmes <dholmes at openjdk.org> wrote: > This looks good for doing what it says, but I have to wonder whether it is actually worthwhile doing this unless most OS/lib will support it? What will the implementation be in AIX? I think C-Heap trimming is useful even if only Linux does it. Linux is arguably the most important platform. And https://github.com/openjdk/jdk/pull/10085 would bring demonstratable benefits but did not garner a lot of interest. So I hope to speed it up by splitting parts that are hopefully non-contentious into separate RFEs. I know that AIX has an API to disclaim memory, but have not yet looked deeply into the integration. I also cannot rule out that other platforms may give us similar APIs in the future. ------------- PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Mon Nov 14 07:29:25 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:29:25 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors - Feedback David ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11118/files - new: https://git.openjdk.org/jdk/pull/11118/files/52ca0ff7..7b3a506a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Mon Nov 14 07:32:25 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:32:25 GMT Subject: RFR: 8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback David ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11089/files - new: https://git.openjdk.org/jdk/pull/11089/files/3b93fb89..1a642e0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11089&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11089&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11089.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11089/head:pull/11089 PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Mon Nov 14 07:41:57 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:41:57 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <hpoQpgk-wHVvTD_0YOP2GSrlAvU3_7rmz9LJV8RZ7OQ=.87613a9a-8dea-49c5-a512-4b13c8086642@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> <hpoQpgk-wHVvTD_0YOP2GSrlAvU3_7rmz9LJV8RZ7OQ=.87613a9a-8dea-49c5-a512-4b13c8086642@github.com> Message-ID: <gY2TcmJzHK1I802SiQwH2gvliwbXXbkE0Ld7Vrh1tks=.6ef14619-bfd8-4220-ade4-a166e6053d6f@github.com> On Mon, 14 Nov 2022 03:44:35 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - reduce unnecessary diffs >> - explicit constructor for fake callstacks; revert default ctor > > src/hotspot/share/utilities/nativeCallStack.hpp line 67: > >> 65: static constexpr uintptr_t _fake_address = >> 66: (LP64_ONLY(0x4E4D54535441434BULL) // "NMTSTACK" >> 67: NOT_LP64(0x4E4D5453)); // "NMTS" > > There's no guarantee that these addresses will never be a valid return address, so you may get suprious assertion failures. Since this is used only in debug mode for assertion, I think it's much better to have an extra field `bool _is_fake`. Note that spurious asserts would be *extremely* unlikely. It would involve - a 32-bit platform, since for 64-bit the chance is astronomically low and we are out of the usable address range for user space at least on Linux anyway. - a platform where code pointers can be unaligned, so it rules out all but x86 ans s390 - NMT on and in detail mode - a debug build. Adding an explicit bool would increase the size of NMT callstacks by at least 32-bit. While that is not a show-stopper, I'd like to avoid it if possible. If we restrict it to debug only, we have a different memory layout between debug and release, which I don't like much either. I try to think of something. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stuefe at openjdk.org Mon Nov 14 07:41:59 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 07:41:59 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <jzozfLUjKaaAzynbRsCnxgIOb437X_3Z6WHQ-aOgHGU=.06463d2f-0520-4b78-bc33-1ac208a5e8a8@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> <jzozfLUjKaaAzynbRsCnxgIOb437X_3Z6WHQ-aOgHGU=.06463d2f-0520-4b78-bc33-1ac208a5e8a8@github.com> Message-ID: <7ui6e0kldyjsrhShhbEehhK7VeFgOdlvNJ9ebKCfgzE=.03064b47-c224-4b91-a00b-ebe47284a431@github.com> On Mon, 14 Nov 2022 02:21:25 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - reduce unnecessary diffs >> - explicit constructor for fake callstacks; revert default ctor > > src/hotspot/share/utilities/nativeCallStack.hpp line 67: > >> 65: static constexpr uintptr_t _fake_address = >> 66: (LP64_ONLY(0x4E4D54535441434BULL) // "NMTSTACK" >> 67: NOT_LP64(0x4E4D5453)); // "NMTS" > > Why are the outer parentheses needed? You are right, I'll remove them. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stefank at openjdk.org Mon Nov 14 08:01:32 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 08:01:32 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <79P1w9DIxbQlPsfei2SPI11TOqry6l8JzT4gd8EKtuI=.5d1e8298-0f0d-4153-9558-890bbf9c46e3@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. I'll split this out into separate changes. * Regarding in `#include "include/xxx.h"`: This used to be `#include "prims/xxx.h"` and sorted the rest of the header file. Then they got moved to include/ and the directory was dropped for some reason. @dholmes-ora maybe you have a better name than `include/`, that is less eye jarring? * Remember, a lot of the order we used was generated by a script. Some of the complaints above are against the style that was used ever since we added these includes to the files. * System files comes after the HotSpot includes, with a blank line before. * Platform files don't necessarily use the macro stems and then don't sort after the shared includes. I'm fine with changing this, but that would have to be tackled as a separate proposal. * Sounds like a good proposal to move the external, shared headers like jimage.h and jni.h to a separate section *after* the HotSpot includes. * I also remember that unittest.h needs to go at the end of the list, and have seen many violations of that. I added a blank line as a suggestion that this might give a visual aid of what's going on. * <new> is typically provided via allocation.hpp, but I can leave the few odd <new> includes. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stuefe at openjdk.org Mon Nov 14 08:05:49 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 08:05:49 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <-ER02i5fNCaZX-v56giR7gCbuMkwwpdhg5LBLThVvds=.402ad601-58e3-4529-935a-151e5e76b2b3@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> <-ER02i5fNCaZX-v56giR7gCbuMkwwpdhg5LBLThVvds=.402ad601-58e3-4529-935a-151e5e76b2b3@github.com> Message-ID: <9fb7vy703UZemRy0XnH4NQkPINCR4FzPvjwp3MFhJqk=.56f02d33-0156-48be-862d-7972d36d19a8@github.com> On Sun, 13 Nov 2022 22:55:52 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: > Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. I did not know this was our policy now. Sorry for giving the wrong advice. Maybe we should add this to the hotspot style guide since I'm probably not the only one not knowing this. > > Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. > > I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around to dealing with it. `::snprintf` in the list of candidates for https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been marked. But I don't see new bugs for the as-yet unmarked ones. > > As a general note, as a reviewer my preference is against non-trivial and persnickety code changes that are scattered all over the code base. For something like this I'd prefer multiple more bite-sized changes that were dealing with specific uses. I doubt everyone agrees with me though. I agree with you. Makes backporting a bit easier too. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From dholmes at openjdk.org Mon Nov 14 08:07:34 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 08:07:34 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> Message-ID: <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> On Mon, 14 Nov 2022 01:17:38 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> src/hotspot/os/windows/os_windows.hpp line 35: >> >>> 33: class Thread; >>> 34: >>> 35: static unsigned __stdcall thread_native_entry(Thread*); >> >> Why was this removed? This is needed to correctly specify the call sequence for the thread entry routine when used with `_beginThreadex`: >> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 > > I'm not sure I follow, I didn't remove anything here? Sorry my eyes must be playing tricks on me. ?? Why did you need to add this here? >> src/hotspot/share/utilities/compilerWarnings.hpp line 47: >> >>> 45: #endif >>> 46: >>> 47: #ifndef PRAGMA_DISABLE_VISCPP_WARNING >> >> Why rename this from `MSVC` to `VISCPP`? IIRC the full name is Microsft Visual Studio C++, so you new name is not obviously better and the change just adds noise to the PR. Further `MSVC` matches what MS themselves use and even the attribute namespace in C++11 is `MSVC`. >> Update: I see the inconsistency with `compilerWarnings_visCPP.hpp` > > Yep, it was renamed since the file is also named VISCPP, and I felt that matching the names was a good style change I think it is the file that has the "bad" name in this case. :( But okay. >> src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 37: >> >>> 35: #endif >>> 36: >>> 37: #if defined(__clang_major__) && \ >> >> Not clear why this was moved ?? > > I'm not sure which one you're referring to, but the PRAGMA_DIAG_PUSH/POP was moved up to the top of the header to match compilerWarnings_visCPP.hpp, and PRAGMA_DISABLE_GCC_WARNING_AUX was moved to macros.hpp as the more general PRAGMA macro, since it's useful for all compilers and not just gcc Okay ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 08:10:45 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 08:10:45 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <JifpzUjotcu252D3-usFaP6YYOm26UkGxNigCl8TyS4=.ad93c4a9-b88d-40f5-adc4-461e78311a5b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <JifpzUjotcu252D3-usFaP6YYOm26UkGxNigCl8TyS4=.ad93c4a9-b88d-40f5-adc4-461e78311a5b@github.com> Message-ID: <_hwdz6oHBktC95v1xdiK-szAPmho2bxrDMCWBUAPGo0=.2ff0735d-5fda-4322-ac1f-1b4bec31dc56@github.com> On Mon, 14 Nov 2022 01:39:17 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> src/hotspot/share/utilities/debug.hpp line 172: >> >>> 170: int status, const char* detail); >>> 171: ATTRIBUTE_PRINTF(4, 5) >>> 172: void report_fatal(VMErrorType error_type, const char* file, int line, const char* detail_fmt, ...); >> >> Why were the ATTRIBUTE_PRINTFs removed? > > The ATTRIBUTE_PRINTF macros are still there, just moved in front of the methods Wow I'm really having eyesight problems today! Sorry about that. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From stuefe at openjdk.org Mon Nov 14 08:30:35 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 08:30:35 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> Message-ID: <vIdqGyQkQAXZauPAFCN2CkP_sG-N2nN90_HYrOHisYA=.e704c782-83d0-4464-b646-75f2cda0b686@github.com> On Mon, 14 Nov 2022 04:14:24 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > ATTRIBUTE_SCANF Hi Julian, unfortunately, your patch will make backporting more difficult. We cannot downport it to older releases compiled with older compilers. But since it touches a lot of files it will sit smack in the middle of patch sequences, requiring manual merges for patches after it. Is there any benefit to using the new syntax compared to the old one? It does seem similar verbose, so I don't see any benefit there. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11081 From stuefe at openjdk.org Mon Nov 14 09:26:47 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 09:26:47 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v3] In-Reply-To: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> Message-ID: <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> > While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. > > The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : > > > #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(0) : NativeCallStack::empty_stack()) > #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(1) : NativeCallStack::empty_stack()) > > > and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: > > > void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); > > > In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). > > However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): > > > 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # Load tracking level > cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> > cb9a7e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9a80: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> > # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: > cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> > cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 > cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 > cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) > cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): > cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx > ... > cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. > > --------------------- > > The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. > > This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. > > In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: > > > 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # load tracking level > cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> > cb990e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9910: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> > # no: nothing more to do ... > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > ... > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: > cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx > .. > cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. > > -------------- > > Results: > > When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback Ioi and David ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11040/files - new: https://git.openjdk.org/jdk/pull/11040/files/8657b2d4..ac2d333d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11040&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11040&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11040.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11040/head:pull/11040 PR: https://git.openjdk.org/jdk/pull/11040 From stefank at openjdk.org Mon Nov 14 09:28:38 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 09:28:38 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <h5ph1Zz-NXkaf65cn9rW33rps6qW6RPOaJYrj7RrO04=.12781fcc-3669-456f-829c-3b2416c83c0c@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. Closing this down and restarting this discussion with limited patch, which only adds the missing include/ dir: #11133. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stuefe at openjdk.org Mon Nov 14 09:32:19 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 09:32:19 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <A43rcL2ECFfsPx0O49yVfdBLe6DtEVGBaHguxjI2cXw=.d079db0f-9d81-4a15-a883-ceb8c4640cb1@github.com> Message-ID: <-SDY-qXt7AqarYO7Z4EFxYIdQagMGZRUGOBH77Vv4FE=.fc395620-2e0a-4d91-b257-a9f9dbf91eb1@github.com> On Thu, 10 Nov 2022 20:41:49 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > (I commented before but for some reason it's been lost). > > There's one use of the default constructor that you've missed (I found that by removing the body of the NativeCallStack() constructor): > > > ReservedMemoryRegion(const ReservedMemoryRegion& rr) : > VirtualMemoryRegion(rr.base(), rr.size()) { > *this = rr; > } > > > I think it will be much safer to leave the existing default constructor, and have something like: > > > private: > NativeCallStack(int dummy) { > _dummy[0] = NULL; > } > > public: > inline static NativeCallStack fake_stack() { > NativeCallStack fake(0); > return fake; > } > > > This will keep the behavior the same as before. Note that your patch will change the behavior if the fake stack is actually used. E.g., for this function: > > > inline bool is_empty() const { > return _stack[0] == NULL; > } > > > If you are absolutely sure that the fake stacks are never used, and really want to get rid of the `_stack[0] = NULL`, I would suggest adding a new debug-only field, and add asserts like this in all public functions: > > > inline bool is_empty() const { > assert(!is_fake, "sanity"); > return _stack[0] == NULL; > } Hi @iklam, @dholmes-ora, I changed the numerical value of the fake address to one that cannot be part of a valid callstack (0xFF....FE). I tried to change the type of _fake_address to address to save two casts as @dholmes-ora suggested, but was not able to convince the compiler to init it with a numerical, regardless how I modified the cast. Maybe it has something to do with it being constexpr. Casting the value to address in code seemed no problem, so I left that. I re-ran runtime/NMT and all gtests for x64 and x86. ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stefank at openjdk.org Mon Nov 14 09:34:05 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 09:34:05 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ Message-ID: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. ------------- Commit messages: - 8296926: Use proper include lines for files in include/ Changes: https://git.openjdk.org/jdk/pull/11133/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296926 Stats: 270 lines in 154 files changed: 109 ins; 116 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/11133.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11133/head:pull/11133 PR: https://git.openjdk.org/jdk/pull/11133 From bkilambi at openjdk.org Mon Nov 14 09:37:53 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 14 Nov 2022 09:37:53 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v4] In-Reply-To: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> Message-ID: <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Removed svesha3 feature check for eor3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10407/files - new: https://git.openjdk.org/jdk/pull/10407/files/449524ad..7f413360 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=02-03 Stats: 16 lines in 6 files changed: 0 ins; 9 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10407.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10407/head:pull/10407 PR: https://git.openjdk.org/jdk/pull/10407 From bkilambi at openjdk.org Mon Nov 14 09:37:54 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 14 Nov 2022 09:37:54 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v3] In-Reply-To: <UPrEKE3RiZtoukgCSeQo5o5ELlXyQ8iI2m7KGiqINg4=.a079f251-f850-4313-9655-a5e770682d48@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <UPrEKE3RiZtoukgCSeQo5o5ELlXyQ8iI2m7KGiqINg4=.a079f251-f850-4313-9655-a5e770682d48@github.com> Message-ID: <TR-4vKGuNh3OfEjczT0dQv1heGFlEbnpcKDdkXVGQ44=.b7247609-77b4-429e-92dc-14acc51bc556@github.com> On Wed, 19 Oct 2022 14:27:34 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Changed the modifier order preference in JTREG test The new patch removes the svesha3 feature check for eor3 instruction. Eor3 instruction is part of the SHA3 feature but it is present by default in SVE2 and is not part of the SVESHA3 feature. Please review. Thank you .. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From tschatzl at openjdk.org Mon Nov 14 10:04:34 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 14 Nov 2022 10:04:34 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> References: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> Message-ID: <HcQGrFF3VHz4TOIGzodoCRU2dj9Iz76v46ZviBGcRV0=.9baac3de-b4f5-4305-a08f-0e5320f071b1@github.com> On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang <fyang at openjdk.org> wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. Marked as reviewed by tschatzl (Reviewer). I would have changed it only for the use in x86_32 files, in the actual parameters, but others are okay as is, so go ahead pushing it. Sorry for holding up this review for so long. ------------- PR: https://git.openjdk.org/jdk/pull/10783 From aph at openjdk.org Mon Nov 14 10:06:37 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 14 Nov 2022 10:06:37 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> Message-ID: <XdPOTwPl5tegUE34UC9E9KA8PdKehAzH2DwA7JV5OGI=.a51c91c9-5fee-4f0b-a05c-a9eb787fe35b@github.com> On Mon, 14 Nov 2022 05:32:20 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > include missing os head file Kim said: > As a general note, as a reviewer my preference is against non-trivial and persnickety code changes that are scattered all over the code base. For something like this I'd prefer multiple more bite-sized changes that were dealing with specific uses. I doubt everyone agrees with me though. There's a lot of wisdom in what you say. It's far too easy to mess things up when doing cleanups for compiler warnings. Also, long patches never get enough reviewing. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stefank at openjdk.org Mon Nov 14 10:09:31 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 10:09:31 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <Srzk-lLV5JsKV7KGHIMbRbON36doqa2xXVJ-2KrCEpo=.7501c608-51c5-439e-844f-83151ad87989@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> <Srzk-lLV5JsKV7KGHIMbRbON36doqa2xXVJ-2KrCEpo=.7501c608-51c5-439e-844f-83151ad87989@github.com> Message-ID: <lmyjFxCi0AF-F0GEixYT9RB4Eme4CodLFLtg788f2sY=.5536d2b2-603b-4535-ad64-f55afed88b24@github.com> On Mon, 14 Nov 2022 00:12:45 GMT, David Holmes <dholmes at openjdk.org> wrote: > > When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > The default initial capacity is 2, the explicit initial capacities are not 2 and in many case >>>2. So without an explicit capacity passed in these GrowableArrays would likely waste a lot of time unnecessarily growing. I don't think either of these parameters should really have a default value - in which case the order could have remained as it was. Sure, that's an alternative. Before committing to doing that, could you take a look at the similar change to CHeap allocated BitMaps #11084? I want the two classes to be similar in where we put the "allocation strategy". With both these changes I've pushed them to the beginning of the parameter list. With your suggestion I either have to make GrowableArray and CHeapBitMap inconsistent, or I need to change CHeapBitMap to also have the MEMFLAGS at the end, which would force all users to set the `clear` parameter. I'm fine with either approach, but I'd like to get feedback on that before moving ahead with this PR. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From eosterlund at openjdk.org Mon Nov 14 10:15:41 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 10:15:41 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v2] In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <jkElxhIdcB5KvElkWQohXuv9cJHa5Mx0EQIDey3i19E=.50fc970f-7b31-4b33-bda1-0264005da2a7@github.com> > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Fix verification and RISC-V support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11111/files - new: https://git.openjdk.org/jdk/pull/11111/files/fc5996f2..7becc31e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=00-01 Stats: 9 lines in 4 files changed: 6 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Mon Nov 14 10:15:41 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 10:15:41 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <9vLlu1jO4Rh1tE1-Fm5xb-79FvYNmyLw6ORbZEkZcvM=.3c47a0b1-1183-40c1-b86b-707d0ca03d18@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <yt4yTcfj4Q41X8EparF0OPS2CufoZw9pvBceg0o75X4=.5c93b33a-e01d-47d3-8ffd-e1099bc626cf@github.com> <9vLlu1jO4Rh1tE1-Fm5xb-79FvYNmyLw6ORbZEkZcvM=.3c47a0b1-1183-40c1-b86b-707d0ca03d18@github.com> Message-ID: <oKfNXkgKfygB1iRO3XVY7J7vOH5fL4QbCgzvaIspgPw=.5ec3c4f7-eca8-4e3a-b346-bde569d9993a@github.com> On Sat, 12 Nov 2022 01:30:57 GMT, Fei Yang <fyang at openjdk.org> wrote: > > Nice to have PR 11111. It's gonna take a long time until we see 111111. > > Nice PR number :-) May I ask if you could also add handling for riscv while you are at it? We have ported loom to this platform recently [1]. I can help perform the necessary testing if needed. > > [1] https://git.openjdk.org/jdk/commit/91292d56a9c2b8010466d105520e6e898ae53679 Sure. Included what I think is the required RISC-V fix in my last update. Please check it out, and hope it works for you. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Mon Nov 14 10:15:41 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 10:15:41 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> Message-ID: <Y_t70n75CPnEkRC9Z_b98N4iEEOKMViqEkElgRc-r3M=.13107f82-4d11-4c9f-9203-722d91ac9d7f@github.com> On Sat, 12 Nov 2022 08:08:15 GMT, Fei Yang <fyang at openjdk.org> wrote: > PS: I see JVM crashes when running Skynet with extra VM option: -XX:+VerifyContinuations on linux-aarch64 platform. > > $java --enable-preview -XX:+VerifyContinuations Skynet > > ``` > # A fatal error has been detected by the Java Runtime Environment: > > # after -XX: or in .hotspotrc: SuppressErrorAt=# > # Internal Error/stackChunkOop.cpp (/home/realfyang/openjdk-jdk/src/hotspot/share/oops/stackChunkOop.cpp:433), pid=1904185:433, tid=1904206 > > [thread 1904216 also had an error]# assert(_chunk->bitmap().at(index)) failed: Bit not set at index 208 corresponding to 0x0000000637c512d0 > > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.realfyang.openjdk-jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > ``` Thanks for finding that. Turns out that the verification code for the stack chunk bitmap expected entries even when the value is null, while the logic that added bitmap entries didn't add if it was null. I fixed it by making sure even null entries are added to the bitmap. While it doesn't really matter if they are added or not, I think it would be the least surprising if iterating over the oops with and without the bitmap yields the same result. I have verified manually with all GCs that Skynet works with the verification flag, on x86_64 and AArch64. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From stuefe at openjdk.org Mon Nov 14 10:16:58 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 10:16:58 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> Message-ID: <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> > When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. > > Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. > > Examples: > > NMT, thread stacks, 10000 Threads, default: > > > - Thread (reserved=10332400KB, committed=331828KB) > (thread #10021) > (stack: reserved=10301560KB, committed=300988KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) > > > NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: > > > - Thread (reserved=10332400KB, committed=10284360KB) > (thread #10021) > (stack: reserved=10301560KB, committed=10253520KB) > (malloc=19101KB #60755) > (arena=11739KB #20037) Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: test changes, comment change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10403/files - new: https://git.openjdk.org/jdk/pull/10403/files/322dc77f..9cacecbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10403&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10403.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10403/head:pull/10403 PR: https://git.openjdk.org/jdk/pull/10403 From eosterlund at openjdk.org Mon Nov 14 10:22:34 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 10:22:34 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> References: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> Message-ID: <HMvNlppMXp7WLUUIPlBCdq1Gx49ajYwFdapLpTQq5Vk=.c75860f9-58a5-4201-9168-5e40bfa45561@github.com> On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang <fyang at openjdk.org> wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. The tmp_thread name came from the x86 port where there are places where the tmp_thread register can't be used arbitrarily, but rather is limited to contain the thread pointer, as there is no dedicated thread register on 32 bit x86. The name was copied to the other platforms and it never made sense. I'm happy with this cleanup. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10783 From stuefe at openjdk.org Mon Nov 14 10:23:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 10:23:28 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> Message-ID: <C2hmvi594kwAJ4-p2xQdGtlmjimzI5TfvZHcBueskBc=.067cf062-8163-48e6-aea4-8d70e61f3f5f@github.com> On Mon, 14 Nov 2022 05:32:20 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > include missing os head file src/hotspot/share/adlc/output_c.cpp line 2570: > 2568: int idx = inst.operand_position_format(arg_name); > 2569: if (strcmp(arg_name, "constanttablebase") == 0) { > 2570: ib += snprintf(ib, (buflen - (ib - idxbuf)), " unsigned idx_%-5s = mach_constant_base_node_input(); \t// %s, \t%s\n", Use sizeof(buffer) instead of buflen? Also, possibly using a helper macro like this: #define remaining_buflen(buffer, position) (sizeof(buffer) - (position - buffer)) would make the code a bit easier on the eye. Or, if not a macro, an inline helper function, that could assert also array boundaries. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stefank at openjdk.org Mon Nov 14 10:46:36 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 10:46:36 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v2] In-Reply-To: <jkElxhIdcB5KvElkWQohXuv9cJHa5Mx0EQIDey3i19E=.50fc970f-7b31-4b33-bda1-0264005da2a7@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <jkElxhIdcB5KvElkWQohXuv9cJHa5Mx0EQIDey3i19E=.50fc970f-7b31-4b33-bda1-0264005da2a7@github.com> Message-ID: <gChJ62Cl0dYXG916j4nVIgsIzWo-J55hXgGxEey6CVI=.f014727c-0334-4280-acde-5b81d569080a@github.com> On Mon, 14 Nov 2022 10:15:41 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix verification and RISC-V support Looks good to me. I wrote parts of this code, so I want wan extra Reviewer on this patch. I wonder if we should rename the title to something less ZGC specific? src/hotspot/share/prims/stackwalk.hpp line 102: > 100: Method* method() override { return _vfst.method(); } > 101: int bci() override { return _vfst.bci(); } > 102: oop cont() override { return _vfst.continuation(); } Revert ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Mon Nov 14 10:52:14 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 10:52:14 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v2] In-Reply-To: <gChJ62Cl0dYXG916j4nVIgsIzWo-J55hXgGxEey6CVI=.f014727c-0334-4280-acde-5b81d569080a@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <jkElxhIdcB5KvElkWQohXuv9cJHa5Mx0EQIDey3i19E=.50fc970f-7b31-4b33-bda1-0264005da2a7@github.com> <gChJ62Cl0dYXG916j4nVIgsIzWo-J55hXgGxEey6CVI=.f014727c-0334-4280-acde-5b81d569080a@github.com> Message-ID: <Ckz5VJr0LpXA5hedI_LLdo0hOkov49FkZ3HivgebHUI=.1c1a437d-c35b-4011-a7be-e9c752532e62@github.com> On Mon, 14 Nov 2022 10:43:23 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix verification and RISC-V support > > Looks good to me. I wrote parts of this code, so I want wan extra Reviewer on this patch. > > I wonder if we should rename the title to something less ZGC specific? Thanks for the review @stefank! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From aph at openjdk.org Mon Nov 14 11:11:27 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 14 Nov 2022 11:11:27 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <vIdqGyQkQAXZauPAFCN2CkP_sG-N2nN90_HYrOHisYA=.e704c782-83d0-4464-b646-75f2cda0b686@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> <vIdqGyQkQAXZauPAFCN2CkP_sG-N2nN90_HYrOHisYA=.e704c782-83d0-4464-b646-75f2cda0b686@github.com> Message-ID: <SRjjAqG5gAsvlkrnUTYFXBwO_Urv7DC1Z3yse50RPb4=.c7a1f821-af0b-4383-a69a-612585ebc17c@github.com> On Mon, 14 Nov 2022 08:28:04 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > unfortunately, your patch will make backporting more difficult. We cannot downport it to older releases compiled with older compilers. But since it touches a lot of files it will sit smack in the middle of patch sequences, requiring manual merges for patches after it. > > Is there any benefit to using the new syntax compared to the old one? It does seem similar verbose, so I don't see any benefit there. I have to agree with Thomas here. Newer is not always better. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From kbarrett at openjdk.org Mon Nov 14 11:41:17 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Nov 2022 11:41:17 GMT Subject: RFR: 8296886: Fix various include sort order issues In-Reply-To: <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> Message-ID: <MfGGgkYsvTjqm5ClboYxOSRR5ar9iNUGAzINXl3gkOQ=.67e650a1-31b5-4cb4-a5bb-e62389c8c4ba@github.com> On Fri, 11 Nov 2022 21:21:43 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > src/hotspot/os/windows/jvm_windows.cpp line 27: > >> 25: #include "precompiled.hpp" >> 26: #include "include/jvm.h" >> 27: #include "os_windows.hpp" > > os_windows should be at the end, included using `OS_HEADER("os")`. But should we be directly including os_windows.hpp, rather than including os.hpp? ------------- PR: https://git.openjdk.org/jdk/pull/11108 From jwaters at openjdk.org Mon Nov 14 12:00:28 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 12:00:28 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <_hwdz6oHBktC95v1xdiK-szAPmho2bxrDMCWBUAPGo0=.2ff0735d-5fda-4322-ac1f-1b4bec31dc56@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <JifpzUjotcu252D3-usFaP6YYOm26UkGxNigCl8TyS4=.ad93c4a9-b88d-40f5-adc4-461e78311a5b@github.com> <_hwdz6oHBktC95v1xdiK-szAPmho2bxrDMCWBUAPGo0=.2ff0735d-5fda-4322-ac1f-1b4bec31dc56@github.com> Message-ID: <aRZ0z2fQtE_oqBULtBDXoWOjZTczhBd77qXOuxkA1a0=.a485227d-efdf-42d6-9a79-a01959ddce42@github.com> On Mon, 14 Nov 2022 08:08:40 GMT, David Holmes <dholmes at openjdk.org> wrote: >> The ATTRIBUTE_PRINTF macros are still there, just moved in front of the methods > > Wow I'm really having eyesight problems today! Sorry about that. Haha, no worries ------------- PR: https://git.openjdk.org/jdk/pull/11081 From kbarrett at openjdk.org Mon Nov 14 12:14:29 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Nov 2022 12:14:29 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <ie1izA5PqtodXGxKwtgp4mHWgJvIVQm5p8txTQMOCuQ=.1759b455-df96-442a-8575-f525a21e2169@github.com> On Mon, 14 Nov 2022 09:25:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. This looks fine to me. Since the headers in the include/ directory are the interface exposed by HotSpot to the rest of the JDK, perhaps a better name might have been "hotspot" or "hotspot_api" or something like that. OTOH, the clients in the rest of the JDK include them without a directory, so the directory name doesn't matter to them. I'm fine with the existing name. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11133 From jwaters at openjdk.org Mon Nov 14 12:24:31 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 12:24:31 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> Message-ID: <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> On Mon, 14 Nov 2022 08:01:27 GMT, David Holmes <dholmes at openjdk.org> wrote: >> I'm not sure I follow, I didn't remove anything here? > > Sorry my eyes must be playing tricks on me. ?? > > Why did you need to add this here? It's to avoid redefining the linkage as static in os_windows.cpp (where it's implemented) after an extern declaration (inside the class), which is forbidden by C++11: > The linkages implied by successive declarations for a given entity shall agree. That is, within a given scope, each declaration declaring the same variable name or the same overloading of a function name shall imply the same linkage. While 2019 by default seems to ignore this rule and accepts the conflicting linkage as a language extension, this can cause issues with newer and stricter versions of the Visual C++ compiler (especially with -permissive- passed during compilation, which Magnus and Daniel have pointed out in another discussion will become the default mode of compilation in the future). It's not possible to declare a static friend inside a class, so the addition above takes advantage of another C++ feature instead: > ?11.3/4 [class.friend] A function first declared in a friend declaration has external linkage (3.5). Otherwise, the function retains its previous linkage (7.1.1). ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 14 12:53:29 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 12:53:29 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <9VDQOwxKWG4ynIq6Kq0_g9l4Mu9Q-GJCwI6dnFh-hyc=.8434a7c1-096c-49e5-bc0e-b8a8911462a8@github.com> On Mon, 14 Nov 2022 09:25:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. Please rename `share/include` to something more meaningful e.g. `share/export` or `share/hotspot` (as Kim suggested). Software that contains an `include` directory typically also specifies `-Iinclude/` so that you see `#include "foo.h"` not `#include "include/foo.h"`. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11133 From kbarrett at openjdk.org Mon Nov 14 13:42:02 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Nov 2022 13:42:02 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> Message-ID: <ZFIm743da2bNDg0P_C6hPwK8ld2QAZSiR1vt3rQJFC0=.1d280d20-15a1-42fb-970b-07e565a320c5@github.com> On Mon, 14 Nov 2022 04:14:24 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > ATTRIBUTE_SCANF Changes requested by kbarrett (Reviewer). make/hotspot/lib/CompileJvm.gmk line 67: > 65: # Hotspot cannot handle an empty build number > 66: VERSION_BUILD := 0 > 67: endif I think the proposed "solution" is *much* worse than this. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From kbarrett at openjdk.org Mon Nov 14 13:42:04 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Nov 2022 13:42:04 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v2] In-Reply-To: <cvaF0K8j70TF2vlmgBHzQYAOiYSeP7uZ_Y4yavC0J2w=.e69dd78a-7b52-4a00-969e-e4a195623c2b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <cvaF0K8j70TF2vlmgBHzQYAOiYSeP7uZ_Y4yavC0J2w=.e69dd78a-7b52-4a00-969e-e4a195623c2b@github.com> Message-ID: <PqiJunKhY85BbqgIB5X96kg16Dm-mTszPAFzYOjsfWs=.ebb72cc2-e4e0-4e8f-823e-1f5f97881a6a@github.com> On Mon, 14 Nov 2022 01:42:40 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> make/autoconf/flags-cflags.m4 line 632: >> >>> 630: if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then >>> 631: STATIC_LIBS_CFLAGS="$STATIC_LIBS_CFLAGS -ffunction-sections -fdata-sections \ >>> 632: -DJNIEXPORT='[[gnu::visibility(\"hidden\")]]'" >> >> So IIUC we now use attributes via the C++11 syntax rather than compiler-specific syntax - even where the C++11 syntax is referring to a compiler specific attribute. Is that right? > > Yep, just something that C++ does a little neater, at least in my view We (the HotSpot Group) have not discussed or approved the use of the new C++ attribute syntax, whether for standard attributes or compiler-specific ones. That involves an update to the Style Guide. I'm not convinced switching existing uses from compiler-specific `__attribute__` syntax to compiler-specific `[[attribute]]` syntax is worth the substantial code churn. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From stuefe at openjdk.org Mon Nov 14 15:17:24 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 14 Nov 2022 15:17:24 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 Message-ID: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> We noticed that NMT tests on our slower PPC machines started failing. The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. --- This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). ------------- Commit messages: - disable NMT stack printing Changes: https://git.openjdk.org/jdk/pull/11135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296931 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11135.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11135/head:pull/11135 PR: https://git.openjdk.org/jdk/pull/11135 From luhenry at openjdk.org Mon Nov 14 15:32:32 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 14 Nov 2022 15:32:32 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <mAsTs03BWlAM_YhHXHBNsaJIQqrX95deQyYAYkJwrmE=.7e36f69c-dffc-462a-bf8a-7614153bf8c4@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> <mAsTs03BWlAM_YhHXHBNsaJIQqrX95deQyYAYkJwrmE=.7e36f69c-dffc-462a-bf8a-7614153bf8c4@github.com> Message-ID: <PG2tjiegNRpv4XIh8oLSFPzW8FxEMv3jyghbKYjmaHw=.67f18364-8bf9-4975-b24b-c3340c70eef1@github.com> On Sun, 13 Nov 2022 21:08:53 GMT, Claes Redestad <redestad at openjdk.org> wrote: > Also, I'd like to note that C2 auto-vectorization support is not too far away from being able to optimize hash code computations. At some point, I was able to achieve some promising results with modest tweaking of SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ That would be extremely helpful not just for this case but for many other cases that today require the Vector API or handrolled intrinsics. For cases that would be great to support, a good guide is the [gcc autovectorization support](https://gcc.gnu.org/projects/tree-ssa/vectorization.html) given they use SLP as well. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From luhenry at openjdk.org Mon Nov 14 15:49:31 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 14 Nov 2022 15:49:31 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <XP0qyXp11-t8K0GNu0s9QSUzTX_YLZGElnfH3V-0KTI=.3fb35ca5-cb29-4dfa-b3b9-110d756de4ec@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Could you please post JMH microbenchmarks with and without this change? You can run them with `org.openjdk.bench.java.security.MessageDigests` [1] [1] https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/security/MessageDigests.java ------------- PR: https://git.openjdk.org/jdk/pull/11054 From stefank at openjdk.org Mon Nov 14 16:04:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 14 Nov 2022 16:04:30 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <ORKl1lE7IuVrWBwe5TGaklZkumTw1ePMjNnrAhqD5ks=.b66bcb0b-536d-4b73-bc5d-0de0cc424232@github.com> On Mon, 14 Nov 2022 09:25:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. An alternative is to just that include/ is special, and shouldn't be specified, but then properly sort jvm.h and friends alphabetically, as specified in the Style Guide. I'm fine with either solution, but I really want to remove this half measure that only some of the directory-less include lines are put at the top of the include block. If we are going to have such a special-case rule, then I'd argue that we should come up with a structure that is easy to explain and maintain, and write it down in our Style Guide. ------------- PR: https://git.openjdk.org/jdk/pull/11133 From eosterlund at openjdk.org Mon Nov 14 16:07:34 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 14 Nov 2022 16:07:34 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v3] In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Indentation fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11111/files - new: https://git.openjdk.org/jdk/pull/11111/files/7becc31e..b20563f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From jwaters at openjdk.org Mon Nov 14 16:12:48 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 14 Nov 2022 16:12:48 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> Message-ID: <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> > After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 > > No changes to the behaviour of the JDK has resulted in any way from this commit Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert to using simpler solution similar to the original 8274980 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11081/files - new: https://git.openjdk.org/jdk/pull/11081/files/bb3ef0dd..fe5371c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11081&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11081&range=01-02 Stats: 28 lines in 2 files changed: 4 ins; 23 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11081.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11081/head:pull/11081 PR: https://git.openjdk.org/jdk/pull/11081 From duke at openjdk.org Mon Nov 14 17:50:47 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 14 Nov 2022 17:50:47 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <hp3DuRUsHA39f-YoJ2AuxY-7ZcVPKDTFXFVB46xsDBo=.d077e145-3b07-43c5-93f4-6d00c29f101f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> <pYQs4FMpxhCrhceVFxyy7tvGrPG3t8RzbKfSOSPyzbs=.e3b7e01d-bbd7-4f73-bf57-245389bd5ec1@github.com> <hp3DuRUsHA39f-YoJ2AuxY-7ZcVPKDTFXFVB46xsDBo=.d077e145-3b07-43c5-93f4-6d00c29f101f@github.com> Message-ID: <CShJXPrVP8_FNql7cOTu_GtUAkAQXi2OQBRwzksFULA=.1f2e9e51-226c-44da-a677-b437c3b2045c@github.com> On Fri, 11 Nov 2022 20:46:57 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> It's not specific to `andq`: there's a huge `#ifdef` block around the definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a nested `#ifdef _LP64` (lines 13515-13585)!) , but declarations aren't guarded by `#ifdef _LP64`. > > Yeah, just got to about the same conclusion by looking at the preprocessor `-E` output.. its declared in the header, but not defined in the 'cpp' file.. One would think that that's a compile error, but its been more then a decade since I looked at the C++ spec; 'C++ compiler is always right'. Don't know that there is anything else for me to do here? `assembler_x86.hpp` `#ifdef _LP64` macros were there before (and it not 'that wrong' or if a better/clean fix exists). `macroAssembler_x86.hpp` has to mirror that with `andq`. (Just going through all the comments, making sure they have been addressed.) PS: In general I get worried about having macros changing object layout, but that's 'water under the bridge' and 64-bit seems big enough reason to have different layout. But its always 'entertaining debugging session' when offset of `a.f` is different in `a.o` and `b.o`, because somebody forgot to define same macros for `b.c` compile command as for `a.c`.. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Mon Nov 14 17:51:38 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Nov 2022 17:51:38 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <a7v9TcVjwr5esqqk7zOuMHmnoIv1vIqBoAFnhn4YhoU=.0907e089-2789-492f-9210-1ba3abbf00a3@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <IEqn-D6Umb4bhYeCvw6d-riT_R7Y6aag-wdMKDHfxSM=.22808600-b52d-49ea-9a39-9f5b27fe2f24@github.com> <a7v9TcVjwr5esqqk7zOuMHmnoIv1vIqBoAFnhn4YhoU=.0907e089-2789-492f-9210-1ba3abbf00a3@github.com> Message-ID: <p4kddEOZ_L6WOGjExJ3jR0qzH-id_Oj2PbCugRspQ-s=.f41bd7d6-042d-45fb-80d5-b327da4e941a@github.com> On Sun, 13 Nov 2022 21:01:21 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> src/hotspot/share/opto/intrinsicnode.hpp line 175: >> >>> 173: // as well as adjusting for special treatment of various encoding of String >>> 174: // arrays. Must correspond to declared constants in jdk.internal.util.ArraysSupport >>> 175: typedef enum HashModes { LATIN1 = 0, UTF16 = 1, BYTE = 2, CHAR = 3, SHORT = 4, INT = 5 } HashMode; >> >> I question the need for `LATIN1` and `UTF16` modes. If you lift some of input adjustments (initial value and input size) into JDK, it becomes indistinguishable from `BYTE`/`CHAR`. Then you can reuse existing constants for basic types. > > UTF16 can easily be replaced with CHAR by lifting up the shift as you say, but LATIN1 needs to be distinguished from BYTE since the former needs unsigned semantics. Modeling in a signed/unsigned input is possible, but I figured we might as well call it UNSIGNED_BYTE and decouple it logically from String::LATIN1. FTR `T_BOOLEAN` effectively represents unsigned byte. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From duke at openjdk.org Mon Nov 14 17:58:36 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 14 Nov 2022 17:58:36 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge remote-tracking branch 'origin/master' into avx512-poly - Vladimir's review - live review with Sandhya - jcheck - Sandhya's review - fix windows and 32b linux builds - add getLimbs to interface and reviews - fix 32-bit build - make UsePolyIntrinsics option diagnostic - Merge remote-tracking branch 'origin/master' into avx512-poly - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db ------------- Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=15 Stats: 1851 lines in 32 files changed: 1815 ins; 3 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Mon Nov 14 17:58:37 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 14 Nov 2022 17:58:37 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15] In-Reply-To: <g9b4K88VLczZ7zoHN0oisP9Qju0EvlQWW5voCyJlGPQ=.6363291f-20aa-4fbc-9a81-96af9b54c76f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <g9b4K88VLczZ7zoHN0oisP9Qju0EvlQWW5voCyJlGPQ=.6363291f-20aa-4fbc-9a81-96af9b54c76f@github.com> Message-ID: <bleuvI4zHKuGW0CZdl_PJf0N5VUHoUzN7XtuPdOLfc8=.83be7f10-841e-40fe-b85b-ca8af13049f9@github.com> On Fri, 11 Nov 2022 17:56:55 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Vladimir's review Try to get clean build, pull in https://github.com/openjdk/jdk/pull/11065 ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Mon Nov 14 18:18:21 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Nov 2022 18:18:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pM2vZcm-QLZHS74sE03J-do6qrCpDTUBmU4TONgFo5k=.e2cacaa6-efbf-4d9c-9f41-f9d258c59f4c@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <mTgCRyZ8rmuNMWQfZuyaTrR0u40uNOuPcPL9ChOYqCE=.b1c8e88d-5eaa-4c6d-a22d-efed2cfe678d@github.com> <pM2vZcm-QLZHS74sE03J-do6qrCpDTUBmU4TONgFo5k=.e2cacaa6-efbf-4d9c-9f41-f9d258c59f4c@github.com> Message-ID: <We-bdA0-trL56XJ7DLWpw8jd77KnDKYGDI1E9BCo1qk=.fa5b9b86-78df-4ed9-aad9-e87d108d1e9c@github.com> On Sun, 13 Nov 2022 19:50:46 GMT, Claes Redestad <redestad at openjdk.org> wrote: > ... several challenges were brought up to the table, including how to deal with all the different contingencies that might be the result of a safepoint, including deoptimization. FTR if the intrinsic is represented as a stand-alone stub, there's no need to care about deoptimization. (In such cases, deopts happen on return from the stub.) It wouldn't be allowed to be a leaf call anymore, but a safepoint check and an OOP map would do the job. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From vlivanov at openjdk.org Mon Nov 14 18:32:38 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Nov 2022 18:32:38 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <mAsTs03BWlAM_YhHXHBNsaJIQqrX95deQyYAYkJwrmE=.7e36f69c-dffc-462a-bf8a-7614153bf8c4@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> <mAsTs03BWlAM_YhHXHBNsaJIQqrX95deQyYAYkJwrmE=.7e36f69c-dffc-462a-bf8a-7614153bf8c4@github.com> Message-ID: <PrXrcgFzP4mr0ZI5WrUizT0Sz_wtTl6AJLS_mvW89dQ=.ae6c18d4-8f5e-48cf-b0bf-8d8db7852346@github.com> On Sun, 13 Nov 2022 21:08:53 GMT, Claes Redestad <redestad at openjdk.org> wrote: > How far off is this ...? Back then it looked way too constrained (tight constraints on code shapes). But I considered it as a generally applicable optimization. > ... do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? Yes, it is able to build the constant table at runtime when folding multiplications of constant coefficients produced during loop unrolling and then packing scalars into a constant vector. Moreover, briefly looking at the code shape, the vectorizer would produce a more optimal loop shape (pre-loop would align vector accesses and would use 512-bit vectors when available; vector post-loop could help as well). ------------- PR: https://git.openjdk.org/jdk/pull/10847 From vlivanov at openjdk.org Mon Nov 14 18:39:29 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Nov 2022 18:39:29 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <CShJXPrVP8_FNql7cOTu_GtUAkAQXi2OQBRwzksFULA=.1f2e9e51-226c-44da-a677-b437c3b2045c@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> <IgOknAH9HUWa5VQropGAIp7I5QXLUGf6IspI_QVEe-Q=.85cfcc50-3c66-4fd3-b23a-65c40bb87423@github.com> <orY8dXwBoWdbN04iP_wTouuDWQl9aqjOFV9APiHY3Dc=.32ab3d07-8d44-4f04-85bd-cc51eb61f414@github.com> <ocwevLqOwgHaiJ46CLgPJFVvDcalz4PWPjREnWQ20A4=.187a1435-ee73-44a5-a15f-f4dc2e9e4a2e@github.com> <pYQs4FMpxhCrhceVFxyy7tvGrPG3t8RzbKfSOSPyzbs=.e3b7e01d-bbd7-4f73-bf57-245389bd5ec1@github.com> <hp3DuRUsHA39f-YoJ2AuxY-7ZcVPKDTFXFVB46xsDBo=.d077e145-3b07-43c5-93f4-6d00c29f101f@github.com> <CShJXPrVP8_FNql7cOTu_GtUAkAQXi2OQBRwzksFULA=.1f2e9e51-226c-44da-a677-b437c3b2045c@github.com> Message-ID: <yAcsnq-lwFhAyP7NWuFIDDmsEBcyvC7TDUk5ri8hu-4=.6d2bfb6d-23b7-4d88-b587-109a3e89550b@github.com> On Mon, 14 Nov 2022 17:48:25 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Yeah, just got to about the same conclusion by looking at the preprocessor `-E` output.. its declared in the header, but not defined in the 'cpp' file.. One would think that that's a compile error, but its been more then a decade since I looked at the C++ spec; 'C++ compiler is always right'. > > Don't know that there is anything else for me to do here? `assembler_x86.hpp` `#ifdef _LP64` macros were there before (and it not 'that wrong' or if a better/clean fix exists). `macroAssembler_x86.hpp` has to mirror that with `andq`. > > (Just going through all the comments, making sure they have been addressed.) > > PS: In general I get worried about having macros changing object layout, but that's 'water under the bridge' and 64-bit seems big enough reason to have different layout. But its always 'entertaining debugging session' when offset of `a.f` is different in `a.o` and `b.o`, because somebody forgot to define same macros for `b.c` compile command as for `a.c`.. Leave it as is. It'll be addressed separately. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From xuelei at openjdk.org Mon Nov 14 19:30:25 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 19:30:25 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v5] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <dWacRv0p7t2PTJHzpvmWFjEaAl9twz7IWmVOjKKeBpo=.997a2712-955b-428f-b756-0d99178bf3df@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: use helper macro ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/128bc806..32e18955 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=03-04 Stats: 10 lines in 3 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From prr at openjdk.org Mon Nov 14 19:30:25 2022 From: prr at openjdk.org (Phil Race) Date: Mon, 14 Nov 2022 19:30:25 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v5] In-Reply-To: <dWacRv0p7t2PTJHzpvmWFjEaAl9twz7IWmVOjKKeBpo=.997a2712-955b-428f-b756-0d99178bf3df@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <dWacRv0p7t2PTJHzpvmWFjEaAl9twz7IWmVOjKKeBpo=.997a2712-955b-428f-b756-0d99178bf3df@github.com> Message-ID: <N1dNltiitV4jbcnmKR70xmbT1CYJWlb6LOWfxB0kONU=.41dc4552-551d-4c84-9890-a11e00556283@github.com> On Mon, 14 Nov 2022 19:05:16 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > use helper macro The single client change looks fine. I didn't look at the rest. ------------- Marked as reviewed by prr (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Mon Nov 14 19:30:29 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 19:30:29 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <C2hmvi594kwAJ4-p2xQdGtlmjimzI5TfvZHcBueskBc=.067cf062-8163-48e6-aea4-8d70e61f3f5f@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> <C2hmvi594kwAJ4-p2xQdGtlmjimzI5TfvZHcBueskBc=.067cf062-8163-48e6-aea4-8d70e61f3f5f@github.com> Message-ID: <wmqpkV8xlNUC0--ANZ3oA2vM9J-sPsf2AMPV8EQ5yJ0=.5aba921d-ec51-42c1-94bf-d5ca4432521c@github.com> On Mon, 14 Nov 2022 10:21:07 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> include missing os head file > > src/hotspot/share/adlc/output_c.cpp line 2570: > >> 2568: int idx = inst.operand_position_format(arg_name); >> 2569: if (strcmp(arg_name, "constanttablebase") == 0) { >> 2570: ib += snprintf(ib, (buflen - (ib - idxbuf)), " unsigned idx_%-5s = mach_constant_base_node_input(); \t// %s, \t%s\n", > > Use sizeof(buffer) instead of buflen? > Also, possibly using a helper macro like this: > > > #define remaining_buflen(buffer, position) (sizeof(buffer) - (position - buffer)) > > would make the code a bit easier on the eye. Or, if not a macro, an inline helper function, that could assert also array boundaries. Thanks for suggestion, which makes the code much easier to read. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Mon Nov 14 19:44:17 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 14 Nov 2022 19:44:17 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: delete swp file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/32e18955..ca4ddcc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=04-05 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From dnsimon at openjdk.org Mon Nov 14 20:13:39 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 14 Nov 2022 20:13:39 GMT Subject: RFR: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value Message-ID: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> This PR fixes a bug related to `HotSpotResolvedJavaFieldImpl.index`. Its value is passed into the `HotSpotResolvedJavaFieldImpl` constructor as an `int`, and is returned by `getIndex()` as an `int` but it was stored as a `short`. This meant that unsigned 16-bit values were not handled correctly. Also included are some related JVMCI cleanups: * added and fixed doc related to `ResolvedJavaField.getOffset()` * replaced assertions with always-enabled checks ------------- Commit messages: - fixed HotSpotResolvedJavaFieldImpl.getIndex - replaced assertions with always-enabled checks - added and fixed doc related to ResolvedJavaField.getOffset() Changes: https://git.openjdk.org/jdk/pull/11142/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11142&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296956 Stats: 180 lines in 7 files changed: 137 ins; 5 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/11142.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11142/head:pull/11142 PR: https://git.openjdk.org/jdk/pull/11142 From sgehwolf at openjdk.org Mon Nov 14 20:27:36 2022 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 14 Nov 2022 20:27:36 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory Message-ID: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> Please review this addition to the jdk.ContainerConfigration event which adds information about the container host. Specifically, the total amount of memory of the host system. Testing: - [x] New test case (passed, fails before) - [ ] JFR tests (ongoing) Thoughts? ------------- Commit messages: - 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory Changes: https://git.openjdk.org/jdk/pull/11143/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11143&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296671 Stats: 48 lines in 7 files changed: 43 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/11143.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11143/head:pull/11143 PR: https://git.openjdk.org/jdk/pull/11143 From duke at openjdk.org Mon Nov 14 21:29:08 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 14 Nov 2022 21:29:08 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> Message-ID: <O8EAjZwOmYlRKNZf5HTf5B4up562VoNdC6I4han2rzg=.a405f0e5-8264-49a2-a1ca-292e2b603f7d@github.com> On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Vladimir's review > - live review with Sandhya > - jcheck > - Sandhya's review > - fix windows and 32b linux builds > - add getLimbs to interface and reviews > - fix 32-bit build > - make UsePolyIntrinsics option diagnostic > - Merge remote-tracking branch 'origin/master' into avx512-poly > - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db (Build finally passing!) Hi @TobiHartmann you had mentioned there were some more tests to run? Looking to see what else needs fixing. Thanks. @iwanowww thanks for the reviews! As you have time, let me know what else you see or if its good for approval? Don't want to switch too much to another intrinsic yet, one crypto algorithm is about what I can fit into my brain at a time. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dholmes at openjdk.org Mon Nov 14 21:54:59 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 21:54:59 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <NnikEO-HpI3xhdlNu8Ibtk-e9VADlVBPJMwl04PPLwk=.6bf1d7af-1b9a-4bca-ac62-2670949159b9@github.com> On Mon, 14 Nov 2022 09:25:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. Okay I'm fine with `jvm.h` being placed in sort order rather than adding the `include/` part. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11133 From dholmes at openjdk.org Mon Nov 14 22:50:57 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Nov 2022 22:50:57 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <DD2hp0FmrDwZLlwDKXq4OtH5qNptO_wD4TVMtMkmZfM=.8b60b016-6bd7-4b96-bbe5-b8424688725b@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). I think we need some input from @chhagedorn on this. ------------- PR: https://git.openjdk.org/jdk/pull/11135 From iklam at openjdk.org Mon Nov 14 22:54:00 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 14 Nov 2022 22:54:00 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v3] In-Reply-To: <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> Message-ID: <TUK8PtdBJSTqzsCQwXwYKd-XPis-x3gNVZh1L-AMhfg=.d394a22c-1c1e-4848-8445-f5ea47c09048@github.com> On Mon, 14 Nov 2022 09:26:47 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Ioi and David The latest versions looks good to me. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/11040 From jrose at openjdk.org Mon Nov 14 23:03:01 2022 From: jrose at openjdk.org (John R Rose) Date: Mon, 14 Nov 2022 23:03:01 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> References: <mgQHdsI_oHeWVQEOQNQLrfplcvatEauNPfq1rEswJF4=.cc842c02-e7e0-4f72-95a0-1033ce101cfe@github.com> <TlQR1R0Jt3DqqMWNwZzUyRpSs7lqI0Cig2zpUnmYI3s=.e3add258-443b-4adb-9b31-9f9a76042ff4@github.com> <NDsoMk5BjB0oLGW6pQagOrm-CWrPjc-wwfRhA3vJt6g=.10119bb3-2d8b-4d53-bfaa-9f7a01dabfd7@github.com> <TCLX4fpqVeou-wQDj1SBil2xIyIzk2NFcj7pPpz_xjs=.4152872e-18aa-426b-b967-68118f7ba62d@github.com> <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> Message-ID: <_C2oCFsbq1QdFO_HjwfXHNt0XrtV06TqRK1a8lpiXsI=.4650c115-d734-4655-bc6a-ec46314ab5ed@github.com> On Fri, 28 Oct 2022 01:47:23 GMT, David Holmes <dholmes at openjdk.org> wrote: > So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. > @rose00 why only adjacently? Nested locking can be interleaved on different monitors. Yes it can; you can have nesting A, B, A. But the thread-based fast-locking list might not cover that case. If it were restricted to only adjacent records in the way I sketched, it would need to use a different, slower technique for the A, B, A case. The trade-off is that if you only allow adjacent recursive locks on the list, you don't need to search the list beyond the first element, to detect re-locking. Dunno if that pencils out to a real advantage, though, since the fallback is slow. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From jrose at openjdk.org Mon Nov 14 23:17:03 2022 From: jrose at openjdk.org (John R Rose) Date: Mon, 14 Nov 2022 23:17:03 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v8] In-Reply-To: <Z8Gxg_4zh13llvce7INDaFCirCegSt2U56btzhfnfa4=.1a042e6a-b446-4dc2-887b-fd7cc3ca79c7@github.com> References: <mgQHdsI_oHeWVQEOQNQLrfplcvatEauNPfq1rEswJF4=.cc842c02-e7e0-4f72-95a0-1033ce101cfe@github.com> <Z8Gxg_4zh13llvce7INDaFCirCegSt2U56btzhfnfa4=.1a042e6a-b446-4dc2-887b-fd7cc3ca79c7@github.com> Message-ID: <uEK8czh8hFTvUw650Tb2gQe1gY9YcZA2UrRHA_YAhoY=.0028ab36-6897-4753-a668-cebfe07941bf@github.com> On Fri, 28 Oct 2022 09:32:58 GMT, Roman Kennke <rkennke at openjdk.org> wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) >> - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - More RISC-V fixes > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - RISC-V port > - Revert "Re-use r0 in call to unlock_object()" > > This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Fix number of rt args to complete_monitor_locking_C, remove some comments > - Re-use r0 in call to unlock_object() > - ... and 27 more: https://git.openjdk.org/jdk/compare/4b89fce0...3f0acba4 FTR I agree with Holmes that a conditional opt-in is better. While we are uncertain of the viability of the new scheme (which FTR I like!) for all our customers, we need to have a dynamic selection of the technique, so we can turn it on and off. Off by default at first, then later on by default, then on with no option at all if all goes well (which I hope it does). Perhaps Lilliput can have it turned on by default, and throw an error if (for some reason) the user tries to turn it off again. That's the way we phased in, and then phased out, biased locking, and it seems to me that this is a closely similar situation. Eventually, if all goes well, we can remove the stack locking code, as we did with biased locking. For more details about that long-running saga, one might look at the history of `-XX:+UseBiasedLocking` in the source base, perhaps starting with `$ git log -S UseBiasedLocking`. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From kvn at openjdk.org Mon Nov 14 23:21:14 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Nov 2022 23:21:14 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <4VgZ82kW_Fc5dwN2IimRW7StzUF8tWaJjDq4hRrhUoI=.e943ec6d-c3db-4655-8744-39c858767b45@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Yes, please, post performance data. Note, TestMD5Intrinsics and TestMD5MultiBlockIntrinsics are regression/correctness tests. Would be nice to have proper JMH benchmarks to show improvement. @sviswa7 or @jatin-bhateja do you agree with these changes? ------------- PR: https://git.openjdk.org/jdk/pull/11054 From sviswanathan at openjdk.org Mon Nov 14 23:42:28 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Nov 2022 23:42:28 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> Message-ID: <psMr8pmTuDtbyh_3LrTHo9v1ln_VKImuYz7UkX4MHeM=.368b4dc0-4f03-4cc4-8f27-1f8707cf4237@github.com> On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations Marked as reviewed by sviswanathan (Reviewer). The x86_64 code looks good to me. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From vlivanov at openjdk.org Tue Nov 15 00:23:53 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 00:23:53 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> Message-ID: <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Vladimir's review > - live review with Sandhya > - jcheck > - Sandhya's review > - fix windows and 32b linux builds > - add getLimbs to interface and reviews > - fix 32-bit build > - make UsePolyIntrinsics option diagnostic > - Merge remote-tracking branch 'origin/master' into avx512-poly > - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 103: > 101: > 102: ATTRIBUTE_ALIGNED(64) uint64_t POLY1305_MASK44[] = { > 103: // OFFSET 64: mask_44 Redundant comment. src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: > 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128) > 383: { > 384: const Register t1 = r13; Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called. src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 387: > 385: const Register t2 = r14; > 386: > 387: __ movq(a0, Address(limbs, 0)); I don't understand how it works. `limbs` comes directly from `c_rarg2` and contains raw oop. So, `Address(limbs, 0)` reads object mark word rather than the first element from the array. (Same situation in `poly1305_limbs_out`. And now I'm curious why doesn't object header corruption trigger a crash.) src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 987: > 985: > 986: // Load R into r1:r0 > 987: poly1305_limbs(R, r0, r1, r1, true); What's the intention here when you pass `r1` twice? Just load `R[0]` and `R[2]`. You could use `noreg` to mark an optional operation and check for it in `poly1305_limbs` before loading the corresponding element. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sviswanathan at openjdk.org Tue Nov 15 00:28:06 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Nov 2022 00:28:06 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> Message-ID: <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> On Tue, 15 Nov 2022 00:10:35 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - make UsePolyIntrinsics option diagnostic >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 387: > >> 385: const Register t2 = r14; >> 386: >> 387: __ movq(a0, Address(limbs, 0)); > > I don't understand how it works. `limbs` comes directly from `c_rarg2` and contains raw oop. So, `Address(limbs, 0)` reads object mark word rather than the first element from the array. > > (Same situation in `poly1305_limbs_out`. And now I'm curious why doesn't object header corruption trigger a crash.) library_call.cpp takes care of that, it passes the address of 0'th element to the stub. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 15 00:36:06 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 00:36:06 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> Message-ID: <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> On Thu, 10 Nov 2022 16:48:19 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: > > - Tweak copyright headers > - Use @requires to disable some tests on x86 > - Use AssertionError for internal exceptions Haven't finished reviewing VM part. 2 questions so far: * `VMStorage` looks very similar to `VMReg`. What's the purpose of the new representation? * why do you structure the header files the way you do? `vmstorage.inline.hpp`, `vmstorage_<cpu>.inline.hpp`, `vmstorageBase.inline.hpp` instead of just `vmstorage.hpp`/`vmstorage_<cpu>.hpp` src/hotspot/cpu/aarch64/downcallLinker_aarch64.cpp line 146: > 144: Register tmp2 = r10; > 145: > 146: VMStorage shuffle_reg = VMS_R19; I'd prefer to see `as_VMStorage(Register)` used instead and all `VMS_...` constants go away. src/hotspot/cpu/aarch64/foreignGlobals_aarch64.cpp line 51: > 49: > 50: objArrayOop inputStorage = jdk_internal_foreign_abi_ABIDescriptor::inputStorage(abi_oop); > 51: parse_register_array(inputStorage, (int) StorageType::INTEGER, abi._integer_argument_registers, as_Register); Converting `type_index` argument from `int` to `StorageType` would allow to avoid explicit casts. src/hotspot/cpu/aarch64/vmstorage_aarch64.inline.hpp line 68: > 66: } > 67: > 68: inline VMStorage as_VMStorage(Register reg) { Mark as `constexpr` maybe? src/hotspot/cpu/x86/downcallLinker_x86_64.cpp line 239: > 237: __ vzeroupper(); > 238: > 239: if(should_save_return_value) { Missing space (here and below). ------------- PR: https://git.openjdk.org/jdk/pull/11019 From vlivanov at openjdk.org Tue Nov 15 00:51:09 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 00:51:09 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> Message-ID: <XAAaQDoWmjbJ0coZZaBwpzxtDMbazRvX6yFPNexV3j4=.93142a64-e120-4f01-8801-d4a0f135c8f7@github.com> On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Vladimir's review > - live review with Sandhya > - jcheck > - Sandhya's review > - fix windows and 32b linux builds > - add getLimbs to interface and reviews > - fix 32-bit build > - make UsePolyIntrinsics option diagnostic > - Merge remote-tracking branch 'origin/master' into avx512-poly > - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db src/hotspot/share/opto/library_call.cpp line 6976: > 6974: > 6975: if (!stubAddr) return false; > 6976: Node* input = argument(1); Receiver null check is missing. Since the method being intrinsified is non-static, the intrinsic itself has to take care of receiver null check. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 15 00:51:07 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 00:51:07 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> Message-ID: <OfSiSh0ho8oFGm5lOgCGxw84XjHzQHdXvyDOcZBQBXo=.f243574a-c6aa-47b2-8bb1-7951232bf83d@github.com> On Tue, 15 Nov 2022 00:25:46 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 387: >> >>> 385: const Register t2 = r14; >>> 386: >>> 387: __ movq(a0, Address(limbs, 0)); >> >> I don't understand how it works. `limbs` comes directly from `c_rarg2` and contains raw oop. So, `Address(limbs, 0)` reads object mark word rather than the first element from the array. >> >> (Same situation in `poly1305_limbs_out`. And now I'm curious why doesn't object header corruption trigger a crash.) > > library_call.cpp takes care of that, it passes the address of 0'th element to the stub. Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse rather than help: // void processBlocks(byte[] input, int len, int[5] a, int[5] r) const Register input = rdi; //input+offset const Register length = rbx; const Register accumulator = rcx; const Register R = r8; ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jwaters at openjdk.org Tue Nov 15 00:52:07 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 15 Nov 2022 00:52:07 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <ZFIm743da2bNDg0P_C6hPwK8ld2QAZSiR1vt3rQJFC0=.1d280d20-15a1-42fb-970b-07e565a320c5@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> <ZFIm743da2bNDg0P_C6hPwK8ld2QAZSiR1vt3rQJFC0=.1d280d20-15a1-42fb-970b-07e565a320c5@github.com> Message-ID: <7OFGmmeTJLL_dl8LZC0y11-crea_x6bM5Sto5_c366k=.8fda2c5a-b7b7-4c98-9a6e-b6c6cf14db3a@github.com> On Mon, 14 Nov 2022 13:08:56 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to using simpler solution similar to the original 8274980 > > make/hotspot/lib/CompileJvm.gmk line 67: > >> 65: # Hotspot cannot handle an empty build number >> 66: VERSION_BUILD := 0 >> 67: endif > > I think the proposed "solution" is *much* worse than this. Reverted to use the original, less intrusive solution from [8274980](https://github.com/openjdk/jdk/pull/11081/commits/83ed3deb29d7344bbc95a3831f2388d077bc59e9) that initially could not work with the older Visual C++ compiler (With a minor improvement to handle #define 0) ------------- PR: https://git.openjdk.org/jdk/pull/11081 From fyang at openjdk.org Tue Nov 15 01:02:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 15 Nov 2022 01:02:03 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> References: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> Message-ID: <k8M7ZLdFtl_UtFnCMrphmDL1cxQP7PCn1np7RZD7Pvw=.199da209-7ce3-40fc-8825-9d2e815d31a5@github.com> On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang <fyang at openjdk.org> wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/10783 From fyang at openjdk.org Tue Nov 15 01:02:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 15 Nov 2022 01:02:03 GMT Subject: Integrated: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> References: <k5Sm0Pq9PlSgmgpQ9WVOAk2J3I_-m_vovXE8WW-D7fk=.a5ce0835-a5da-4a34-80e4-f10dd32b9fcb@github.com> Message-ID: <wtHxPpvHfhfxSWPIJygBAltBfq7Xg-_7KDIAjTgz28c=.41fb334e-601e-4af4-9e65-b380e682ca4d@github.com> On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang <fyang at openjdk.org> wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. This pull request has now been integrated. Changeset: 93d6b1f3 Author: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/93d6b1f3e839a434492821ae516786c7cd4b9dc8 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" Reviewed-by: fjiang, haosun, tschatzl, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/10783 From sspitsyn at openjdk.org Tue Nov 15 03:07:58 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 15 Nov 2022 03:07:58 GMT Subject: RFR: 8296265: Use modern HTML in the JVMTI spec In-Reply-To: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> References: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Message-ID: <nE_BHw3wWIR5TYZz8B2P2jea6C_6idd8xh8yUf_7aGA=.7c4f9f9e-5509-4cae-89b7-8e065ee724ab@github.com> On Fri, 11 Nov 2022 00:43:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Changes: > - removed `<b>` from TOC; > - added CSS style for TOC (to simplify customization, currently it's empty); > - removed `<b>` from from function list (per Phase); > - removed `<b>` from from list of events; > - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; > - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); > - dropped duplicate `"b"` transform. Looks good to me. It would be also nice to look at the resulting `jvmti.html` document. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11099 From stuefe at openjdk.org Tue Nov 15 05:31:58 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 05:31:58 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v2] In-Reply-To: <jzozfLUjKaaAzynbRsCnxgIOb437X_3Z6WHQ-aOgHGU=.06463d2f-0520-4b78-bc33-1ac208a5e8a8@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ZAKMwzzFOAR6YsA_gXh7KSggKCieyLdwiKqTaNZUohU=.fa44c02f-57c2-4df0-9f85-1a4d4343c201@github.com> <jzozfLUjKaaAzynbRsCnxgIOb437X_3Z6WHQ-aOgHGU=.06463d2f-0520-4b78-bc33-1ac208a5e8a8@github.com> Message-ID: <oGA9OzR72POjo8fqBcyj8RCuM7Gv94esfMADhngRcRs=.371f50d4-0820-47ca-923c-c3eb318c1a2c@github.com> On Mon, 14 Nov 2022 02:26:54 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - reduce unnecessary diffs >> - explicit constructor for fake callstacks; revert default ctor > > This doesn't seem unreasonable, though I can't comment on the details of what the compiler may or may not, do here. > > A couple of nits below. If @iklam (or others) is okay with this I will also approve. > > Thanks. @dholmes-ora are you okay with the latest version? ------------- PR: https://git.openjdk.org/jdk/pull/11040 From fyang at openjdk.org Tue Nov 15 06:37:58 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 15 Nov 2022 06:37:58 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default In-Reply-To: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> Message-ID: <sSd2ecLO90FhtCePmcH7--C-mmvRjmxqt64yCphUulA=.7b5d951d-66c6-41ab-a002-a35da8addd4a@github.com> On Tue, 15 Nov 2022 04:05:35 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. > > >> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" > bool UseRVA20U64 = true {ARCH product} {default} > bool UseRVC = true {ARCH product} {default} > openjdk version "20-internal" 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html > [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 > > Thanks, > Xiaolin You might want to disable UseRVA20U64 when the C extension is not available at the same time. ------------- PR: https://git.openjdk.org/jdk/pull/11155 From xlinzheng at openjdk.org Tue Nov 15 06:40:07 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 15 Nov 2022 06:40:07 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default In-Reply-To: <sSd2ecLO90FhtCePmcH7--C-mmvRjmxqt64yCphUulA=.7b5d951d-66c6-41ab-a002-a35da8addd4a@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <sSd2ecLO90FhtCePmcH7--C-mmvRjmxqt64yCphUulA=.7b5d951d-66c6-41ab-a002-a35da8addd4a@github.com> Message-ID: <zIKZQs-rjtdcTYgc1lAbiQMAC9Es6D7nZQpq3MwHtHM=.b4581424-fd3b-443b-bcbd-92f6c922c241@github.com> On Tue, 15 Nov 2022 06:35:53 GMT, Fei Yang <fyang at openjdk.org> wrote: > You might want to disable UseRVA20U64 when the C extension is not available at the same time. Oh yes. I forgot that one. Will add the logic soon. ------------- PR: https://git.openjdk.org/jdk/pull/11155 From kbarrett at openjdk.org Tue Nov 15 06:42:58 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Nov 2022 06:42:58 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> Message-ID: <XXVqN4ByCrB34JRZSgiNYWsdrwEOTKjo5u81sTFG5bE=.7748c17a-4a13-42aa-b10d-219fe6775da2@github.com> On Mon, 14 Nov 2022 16:12:48 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to using simpler solution similar to the original 8274980 A few more issues as I slog my way through this commingled PR. Probably more to come. src/hotspot/share/utilities/globalDefinitions.hpp line 50: > 48: > 49: #ifndef ATTRIBUTE_ALIGNED > 50: #define ATTRIBUTE_ALIGNED(x) alignas(x) HotSpot Group has not discussed or approved use of `alignas` - see https://bugs.openjdk.org/browse/JDK-8250269. This is another change that is independent of most of the rest of this PR, and should be dealt with separately. The various MSVC-conditional direct uses of `_declspec(align(N))` should probably currently be using `ATTRIBUTE_ALIGNED`. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11081 From kbarrett at openjdk.org Tue Nov 15 06:42:59 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Nov 2022 06:42:59 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <7OFGmmeTJLL_dl8LZC0y11-crea_x6bM5Sto5_c366k=.8fda2c5a-b7b7-4c98-9a6e-b6c6cf14db3a@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> <ZFIm743da2bNDg0P_C6hPwK8ld2QAZSiR1vt3rQJFC0=.1d280d20-15a1-42fb-970b-07e565a320c5@github.com> <7OFGmmeTJLL_dl8LZC0y11-crea_x6bM5Sto5_c366k=.8fda2c5a-b7b7-4c98-9a6e-b6c6cf14db3a@github.com> Message-ID: <S1wp3kCTkaV5pa7NNeFP57Zrf2JFGs9BNzMC1dYkfXU=.262781f6-e561-42e2-893d-918cf6fc189b@github.com> On Tue, 15 Nov 2022 00:49:59 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> make/hotspot/lib/CompileJvm.gmk line 67: >> >>> 65: # Hotspot cannot handle an empty build number >>> 66: VERSION_BUILD := 0 >>> 67: endif >> >> I think the proposed "solution" is *much* worse than this. > > Reverted to use the original, less intrusive solution from [8274980](https://github.com/openjdk/jdk/pull/11081/commits/83ed3deb29d7344bbc95a3831f2388d077bc59e9) that initially could not work with the older Visual C++ compiler (With a minor improvement to handle #define 0) Sorry, but I don't think that's much better than the prior version, and still doesn't seem better than the current code. What problem is this change supposed to be solving? I didn't find any open bugs that seemed relevant (could be I just didn't recognize such). Whatever it is seems likely to be unrelated to any of the other changes in this PR; it would have been / would be better to deal with it separately. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From kbarrett at openjdk.org Tue Nov 15 06:42:59 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Nov 2022 06:42:59 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> Message-ID: <V_tHKXIx0k0HzvHk3hnP9vVP4F0Vx4uD9a1oz5DTi54=.d8437f88-81fc-4412-a2ee-074fd4c6e9c1@github.com> On Mon, 14 Nov 2022 08:02:49 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Yep, it was renamed since the file is also named VISCPP, and I felt that matching the names was a good style change > > I think it is the file that has the "bad" name in this case. :( But okay. I also think the macro name should be left alone. It's the file suffix that is "bad", though I'm not convinced it's so bad as to be worth changing. (There's also `TARGET_COMPILER_visCPP`.) ------------- PR: https://git.openjdk.org/jdk/pull/11081 From thartmann at openjdk.org Tue Nov 15 06:55:53 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Nov 2022 06:55:53 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: <9XWZNcNcmELCLXDwpuNgpztPrw8xXajJQcj_daf4jhU=.4af44336-021f-4688-9a56-6a90c8e12f53@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <W_qQO4k5o0HZkFFjKpRIQpNRBG5ggCHata3b2f3ggR8=.9b4e4686-96a1-4c7c-a3da-449092a25d18@github.com> <9XWZNcNcmELCLXDwpuNgpztPrw8xXajJQcj_daf4jhU=.4af44336-021f-4688-9a56-6a90c8e12f53@github.com> Message-ID: <l0lf2AhMm1HQObe9BcdfgnWHENQGEVuwWBmh9Fa8Jmg=.55171123-f70a-423c-b702-cb537e011bbd@github.com> On Mon, 24 Oct 2022 09:02:58 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> Volodymyr Paprotski has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> further restrict UsePolyIntrinsics with supports_avx512vlbw > > Thanks, I'll re-run testing. > Hi @TobiHartmann you had mentioned there were some more tests to run? Looking to see what else needs fixing. Thanks. Sure, I re-submitted testing. EDIT: I see that Vladimir already did that. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From xlinzheng at openjdk.org Tue Nov 15 07:07:11 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 15 Nov 2022 07:07:11 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v2] In-Reply-To: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> Message-ID: <mItU3V6T_c6Zj2oYscFfV1dQZuzLLpN5JeXDNhpu-RU=.2be2d6aa-62c9-44c0-9529-5c434608fe3d@github.com> > The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. > > >> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" > bool UseRVA20U64 = true {ARCH product} {default} > bool UseRVC = true {ARCH product} {default} > openjdk version "20-internal" 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html > [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 > > Thanks, > Xiaolin Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Turn off the default true UseRVA20U64 when hardware does not support C ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11155/files - new: https://git.openjdk.org/jdk/pull/11155/files/9556bed6..55e3dbe7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11155&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11155&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11155.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11155/head:pull/11155 PR: https://git.openjdk.org/jdk/pull/11155 From xlinzheng at openjdk.org Tue Nov 15 07:07:12 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 15 Nov 2022 07:07:12 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default In-Reply-To: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> Message-ID: <7eYtazhdLJfakSWOv8X6amM6svXCjz1NYmIyj-36Qho=.e4ee4b19-adae-496a-a4b0-f56d619e847f@github.com> On Tue, 15 Nov 2022 04:05:35 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. > > >> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" > bool UseRVA20U64 = true {ARCH product} {default} > bool UseRVC = true {ARCH product} {default} > openjdk version "20-internal" 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html > [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 > > Thanks, > Xiaolin Tested by manually adding a `_features &= (~CPU_C);` after the aux vector check. The result is > build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" OpenJDK 64-Bit Server VM warning: RVC is not supported on this CPU OpenJDK 64-Bit Server VM warning: UseRVA20U64 is not supported on this CPU bool UseRVA20U64 = false {ARCH product} {default} bool UseRVC = false {ARCH product} {default} openjdk version "20-internal" 2023-03-21 OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) ------------- PR: https://git.openjdk.org/jdk/pull/11155 From kbarrett at openjdk.org Tue Nov 15 07:32:36 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 15 Nov 2022 07:32:36 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> Message-ID: <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> On Mon, 14 Nov 2022 19:44:17 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > delete swp file Mostly okay. There are some places where the result from `os::snprintf` could be used instead of a later `strlen`. Most of those are pre-existing (so could be considered for later cleanups), but in at least one case there was a new strlen call introduced, so making the code slightly worse. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 226: > 224: char buf[512]; > 225: os::snprintf(buf, sizeof(buf), "0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); > 226: if (_model2) os::snprintf(buf+strlen(buf), sizeof(buf) - strlen(buf), "(0x%03x)", _model2); Instead of using `strlen(buf)` (now called twice!) to get the number of characters written, use the result of the first call to `os::snprintf`. src/hotspot/os/bsd/attachListener_bsd.cpp line 251: > 249: BsdAttachOperation* BsdAttachListener::read_request(int s) { > 250: char ver_str[8]; > 251: os::snprintf(ver_str, sizeof(ver_str), "%d", ATTACH_PROTOCOL_VER); We later use `strlen(ver_str)` where we could instead use the result of `os::snprintf`. src/hotspot/os/bsd/attachListener_bsd.cpp line 294: > 292: (atoi(buf) != ATTACH_PROTOCOL_VER)) { > 293: char msg[32]; > 294: os::snprintf(msg, sizeof(msg), "%d\n", ATTACH_ERROR_BADVERSION); Rather than using `strlen(msg)` in the next line, use the result from `os::snprintf`. src/hotspot/os/bsd/attachListener_bsd.cpp line 414: > 412: // write operation result > 413: char msg[32]; > 414: os::snprintf(msg, sizeof(msg), "%d\n", result); Rather than using strlen(msg) in the next line, use the result from os::snprintf. src/hotspot/share/classfile/javaClasses.cpp line 2532: > 2530: // Print module information > 2531: if (module_name != NULL) { > 2532: buf_off = (int)strlen(buf); `buf_off` could be the result of `os::snprintf` instead of calling `strlen`. src/hotspot/share/code/dependencies.cpp line 780: > 778: } > 779: } else { > 780: char xn[12]; os::snprintf(xn, sizeof(xn), "x%d", j); Pre-existing very unusual formatting; put a line break between the statements. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Tue Nov 15 08:25:04 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 08:25:04 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> Message-ID: <nz9IPjZuX8QnocI6yC68IrCccWJHNZ4pupJJniO7hkE=.c224740b-a81b-4804-8d3a-ce98ed8e87f4@github.com> On Mon, 14 Nov 2022 19:44:17 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > delete swp file Hi @XueleiFan, good job, this looks like it was onerous work! One issue I noticed, in ADLC only: we sometimes use the snprintf return value to update a position pointer, e.g. in adlc output_c.cpp; should snprintf return -1, we could run backwards and overstep the beginning of the buffer. Totally up to you if you fix it, and whether as a follow-up RFE or here. If you do, the simplest way may be to add a little `stringStream`-like helper like this to ADLC: class AdlcStringStream { char* const _buf; const size_t _buflen; size_t _pos; public: AdlcStringStream(char* out, size_t outlen) : _buf(out), _buflen(outlen), _pos(0) {} void print(const char* fmt, ...) { if (_pos < _buflen) { va_list ap; va_start (ap, fmt); int written = vsnprintf(_buf + _pos, _buflen - _pos, fmt, ap); va_end(ap); if (written > 0) { _pos += written; } } } const char* buf() const { return _buf; } }; and use that instead. Way easier to read the code then. Optionally, the helper could even handle buffer allocation and destruction. All other remarks inline. Small issues remain, but nothing drastic. Cheers, Thomas src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 226: > 224: char buf[512]; > 225: os::snprintf(buf, sizeof(buf), "0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); > 226: if (_model2) os::snprintf(buf+strlen(buf), sizeof(buf) - strlen(buf), "(0x%03x)", _model2); Here - and in several other places, where we construct a string from multiple parts - the code would be a simpler with `stringStream`: char buf[512]; stringStream ss(buf, sizeof(buf)); ss.print("0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); if (_model2) ss.print("(0x%03x)", _model2); _features_string = os::strdup(buf); or, using `stringStream`s internal buffer: stringStream ss; ss.print("0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); if (_model2) ss.print("(0x%03x)", _model2); _features_string = ss.base(); No manual offset counting required. I leave it up to you if you do it that way. The code here is correct as it is. src/hotspot/share/classfile/javaClasses.cpp line 2562: > 2560: CompiledMethod* nm = method->code(); > 2561: if (WizardMode && nm != NULL) { > 2562: os::snprintf(buf + buf_off, buf_size - buf_off, "(nmethod " INTPTR_FORMAT ")", (intptr_t)nm); I think you should update `buf_off` here, because now you overwrite the last text part. Weird that no test caught that. All this code here in javaClasses.cpp would benefit from using stringStream. src/hotspot/share/utilities/utf8.cpp line 521: > 519: } else { > 520: if (p + 6 >= end) break; // string is truncated > 521: os::snprintf(p, 7, "\\u%04x", c); This should be 6, or? We have 6 characters left before end, assuming end is exclusive. Also, maybe use a named constant? src/java.desktop/macosx/native/libjsound/PLATFORM_API_MacOSX_Ports.cpp line 638: > 636: return; > 637: } > 638: snprintf(channelName, 16, "Ch %d", ch); Can we use a constant here instead of literal 16? ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From eosterlund at openjdk.org Tue Nov 15 08:27:59 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 15 Nov 2022 08:27:59 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> Message-ID: <vWxRLShyb9mE4WJgOPIW8KByU-6hQG4ZcWwSq6YnvyY=.42e5052f-e2fb-4590-947f-03ee6d7efccb@github.com> On Sat, 12 Nov 2022 08:08:15 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > PS: I see JVM crashes when running Skynet with extra VM option: -XX:+VerifyContinuations on linux-aarch64 platform. > > $java --enable-preview -XX:+VerifyContinuations Skynet > > > # A fatal error has been detected by the Java Runtime Environment: > > # after -XX: or in .hotspotrc: SuppressErrorAt=# > # Internal Error/stackChunkOop.cpp (/home/realfyang/openjdk-jdk/src/hotspot/share/oops/stackChunkOop.cpp:433), pid=1904185:433, tid=1904206 > > [thread 1904216 also had an error]# assert(_chunk->bitmap().at(index)) failed: Bit not set at index 208 corresponding to 0x0000000637c512d0 > > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.realfyang.openjdk-jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) @RealFYang did you have a chance to see if my RISC-V changes worked out for you? ------------- PR: https://git.openjdk.org/jdk/pull/11111 From stuefe at openjdk.org Tue Nov 15 08:35:11 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 08:35:11 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> Message-ID: <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> On Tue, 15 Nov 2022 07:13:49 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> delete swp file > > src/hotspot/os/bsd/attachListener_bsd.cpp line 294: > >> 292: (atoi(buf) != ATTACH_PROTOCOL_VER)) { >> 293: char msg[32]; >> 294: os::snprintf(msg, sizeof(msg), "%d\n", ATTACH_ERROR_BADVERSION); > > Rather than using `strlen(msg)` in the next line, use the result from `os::snprintf`. The problem with using the return value of os::snprintf() is that we need to handle the -1 case to prevent the position from running backward. Might be better to use stringStream instead, which should handle the -1 case transparently. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stefank at openjdk.org Tue Nov 15 08:39:14 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 08:39:14 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v2] In-Reply-To: <isxb6GuvewA4RY9rOOYBDWqk4w0EkNH2aEVat1Sr_Ho=.b8eaa6c1-0f03-4d51-857d-7fdab154d374@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> <isxb6GuvewA4RY9rOOYBDWqk4w0EkNH2aEVat1Sr_Ho=.b8eaa6c1-0f03-4d51-857d-7fdab154d374@github.com> Message-ID: <AeOI3H4iJirUs7K7-l8zRrhWKhGM7QkhMu-EFOTCdGU=.54dd4837-afe5-4c3c-ab90-95f56348ff5a@github.com> On Thu, 10 Nov 2022 10:22:45 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). >> >> For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. >> >> When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. >> >> This is what the constructors look like before the patch: >> >> CHeapBitMap() : CHeapBitMap(mtInternal) {} >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); >> >> >> And I'd like to change it to: >> >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); >> >> >> In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. >> >> We could probably condense this even further into just one constructor: >> >> explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} >> >> >> given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction > - 8296774: Removed default MEMFLAGS value from CHeapBitMap I got a suggestion in #11086 that I should keep parameter order. I'll do the same for this PR to be consistent between these two collection/utility types. ------------- PR: https://git.openjdk.org/jdk/pull/11084 From stefank at openjdk.org Tue Nov 15 08:39:15 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 08:39:15 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v3] In-Reply-To: <Srzk-lLV5JsKV7KGHIMbRbON36doqa2xXVJ-2KrCEpo=.7501c608-51c5-439e-844f-83151ad87989@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <HzvC0hEgl-NTZ-I36novjuStycPmJG38kdYEA4DAGgE=.eee08df7-8725-423a-ba53-2cf1483ff6be@github.com> <Srzk-lLV5JsKV7KGHIMbRbON36doqa2xXVJ-2KrCEpo=.7501c608-51c5-439e-844f-83151ad87989@github.com> Message-ID: <AacVfpYuooCXmeRv8LRIcC3Cm_rue1p9iONLwch1vGc=.0ce815d5-2580-47df-b015-9b73b13e50c6@github.com> On Mon, 14 Nov 2022 00:12:45 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Mark constructors explicit > >> When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > The default initial capacity is 2, the explicit initial capacities are not 2 and in many case >>>2. So without an explicit capacity passed in these GrowableArrays would likely waste a lot of time unnecessarily growing. I don't think either of these parameters should really have a default value - in which case the order could have remained as it was. I'm going to go with @dholmes-ora suggestion here, and do a similar change to the CHeapBitMap PR. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 09:03:32 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 09:03:32 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v4] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <eO6hPcWjRM3lgOPC3-ehO_PJmVdff-IPfQMQXYxPT6Q=.18613d77-5917-4f32-9ddc-1dc5ebee1240@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8296776_growablearray_mtnone_cleanout_review - Review dholmes - Mark constructors explicit - 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/7b43d04a..dfa1171b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=02-03 Stats: 3864 lines in 227 files changed: 2273 ins; 1062 del; 529 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 09:32:03 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 09:32:03 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v5] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Try to fix 32-bit builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/dfa1171b..72d580e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=03-04 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From fyang at openjdk.org Tue Nov 15 09:41:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 15 Nov 2022 09:41:57 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> Message-ID: <VCyJE9wRumQ6-HNBPaN3nZVV0WHTwaP9euMyiepJfLA=.5258685c-fc2e-465f-aa3d-aea4bc0ecde8@github.com> On Sat, 12 Nov 2022 08:08:15 GMT, Fei Yang <fyang at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > PS: I see JVM crashes when running Skynet with extra VM option: -XX:+VerifyContinuations on linux-aarch64 platform. > > $java --enable-preview -XX:+VerifyContinuations Skynet > > > # A fatal error has been detected by the Java Runtime Environment: > > # after -XX: or in .hotspotrc: SuppressErrorAt=# > # Internal Error/stackChunkOop.cpp (/home/realfyang/openjdk-jdk/src/hotspot/share/oops/stackChunkOop.cpp:433), pid=1904185:433, tid=1904206 > > [thread 1904216 also had an error]# assert(_chunk->bitmap().at(index)) failed: Bit not set at index 208 corresponding to 0x0000000637c512d0 > > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.realfyang.openjdk-jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > @RealFYang did you have a chance to see if my RISC-V changes worked out for you? Hi, I have performed tier1-3 tests on my linux-riscv64 HiFive Unmatched boards. Results looks good. Thanks for handling riscv at the same time :-) ------------- PR: https://git.openjdk.org/jdk/pull/11111 From chagedorn at openjdk.org Tue Nov 15 09:49:38 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Nov 2022 09:49:38 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <SCzkwORZA3bauy4o16hND6REwbd3cg98jypIjJfIeGY=.0cd60560-ac68-44ad-9811-9bf83c97fa57@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). Thanks for the closer analysis of execution time. I agree with your proposed solution and deciding against an additional option/flag, given the complexity and the limited benefit. The main purpose and motivation behind JDK-8242181 was to get the additional source information in hs_err files. src/hotspot/share/utilities/nativeCallStack.cpp line 100: > 98: // Note: we deliberately omit printing source information here. NativeCallStack::print_on() > 99: // can be called thousands of times as part of NMT detail reporting, and source printing > 100: // can slow down reporting by a factor of 5 or more depending on platform (see JDK-8296931). I'm not sure what the convention is, should we still directly refer to bug numbers in comments? ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11135 From mcimadamore at openjdk.org Tue Nov 15 10:08:22 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 10:08:22 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v18] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <0aDgn8bkT3gjULRqLX7_1doqGRJhDlva7S3Q-uYBtZ4=.23b372a9-8775-4d0c-900f-c8a12d1769b1@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Tweak preview feature description for JEP 434 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/cd3fbe7c..9b97bad6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=16-17 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 15 10:11:20 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 10:11:20 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v3] In-Reply-To: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> Message-ID: <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> > Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). > > For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. > > When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. > > This is what the constructors look like before the patch: > > CHeapBitMap() : CHeapBitMap(mtInternal) {} > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); > > > And I'd like to change it to: > > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); > > > In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. > > We could probably condense this even further into just one constructor: > > explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} > > > given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Align parameter order with GrowableArray changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11084/files - new: https://git.openjdk.org/jdk/pull/11084/files/d1a3069b..9f9c64f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11084&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11084&range=01-02 Stats: 20 lines in 13 files changed: 0 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/11084.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11084/head:pull/11084 PR: https://git.openjdk.org/jdk/pull/11084 From mcimadamore at openjdk.org Tue Nov 15 10:12:12 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 10:12:12 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v19] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <43YEgUwCbX4IMeM2AjG_ZAytW-ibfIqCPW1fmBoYDpQ=.e2ef76bd-b10b-4785-976b-974501043f28@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 49 additional commits since the last revision: - Merge branch 'master' into PR_20 - Tweak preview feature description for JEP 434 - Tweak Arena::close javadoc - Merge pull request #15 from minborg/test Add @apiNote to package-info - Add @apiNote to package-info - Merge pull request #16 from minborg/fix-tests2 Fix failing tests - Fix failing tests - Rename isOwnedBy -> isCloseableBy Fix minor typos Fix StrLenTest/RingAllocator - Fix typo - More javadoc fixes - ... and 39 more: https://git.openjdk.org/jdk/compare/0ecc71f0...20ee6e8d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/9b97bad6..20ee6e8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=17-18 Stats: 15095 lines in 530 files changed: 6855 ins; 6001 del; 2239 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 15 10:17:04 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 10:17:04 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v6] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <jz2IZEzX0S_aoZcwrDwjlE-m2cqotDY1uCV6KxpTLyY=.12e19469-ea17-4e87-b3c9-2f25ca076a16@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Move now unnecessary 'explicit' specifier ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/72d580e4..b2cfd572 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From stuefe at openjdk.org Tue Nov 15 10:17:05 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 10:17:05 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v6] In-Reply-To: <jz2IZEzX0S_aoZcwrDwjlE-m2cqotDY1uCV6KxpTLyY=.12e19469-ea17-4e87-b3c9-2f25ca076a16@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <jz2IZEzX0S_aoZcwrDwjlE-m2cqotDY1uCV6KxpTLyY=.12e19469-ea17-4e87-b3c9-2f25ca076a16@github.com> Message-ID: <q6ZXWGWUvi1ziN3WtiyH3ATkz1U_5IAxu8oc7mOYQAA=.40286f67-a2e3-4b53-b417-2f937fbedb41@github.com> On Tue, 15 Nov 2022 10:13:14 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Move now unnecessary 'explicit' specifier Looks good to me. I like this version better. Thanks for doing this. Small nits remain, nothing big. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11086 From stuefe at openjdk.org Tue Nov 15 10:17:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 10:17:07 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v5] In-Reply-To: <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> Message-ID: <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> On Tue, 15 Nov 2022 09:32:03 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Try to fix 32-bit builds src/hotspot/share/utilities/growableArray.hpp line 603: > 601: > 602: // Resource allocation > 603: uintptr_t bits() const { Is there still a point to this overload? Can we just init with literal 0 instead? src/hotspot/share/utilities/growableArray.hpp line 609: > 607: // CHeap allocation > 608: uintptr_t bits(MEMFLAGS memflags) const { > 609: assert(memflags != mtNone, "Must provide a proper MEMFLAGS"); Can be made static? src/hotspot/share/utilities/growableArray.hpp line 613: > 611: } > 612: > 613: // Arena allocation Can be made static? ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stuefe at openjdk.org Tue Nov 15 10:17:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 10:17:07 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v5] In-Reply-To: <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> Message-ID: <b0jZQIDzcu3bB2nJ5U4ZTDnWarS_XDEXl4IGs2XClXI=.9a6ef567-6847-4839-bc44-e1332be63a27@github.com> On Tue, 15 Nov 2022 10:10:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Try to fix 32-bit builds > > src/hotspot/share/utilities/growableArray.hpp line 613: > >> 611: } >> 612: >> 613: // Arena allocation > > Can be made static? Also, possibly assert _arena & 1 == 0 ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stuefe at openjdk.org Tue Nov 15 10:19:02 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 10:19:02 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <SCzkwORZA3bauy4o16hND6REwbd3cg98jypIjJfIeGY=.0cd60560-ac68-44ad-9811-9bf83c97fa57@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> <SCzkwORZA3bauy4o16hND6REwbd3cg98jypIjJfIeGY=.0cd60560-ac68-44ad-9811-9bf83c97fa57@github.com> Message-ID: <8TqXvGqlLMRQ-A6LpLbvysVOyp2ey1c3dhceQ9W9qrs=.695421c0-9fa1-46f0-8c8f-f045e038fa9e@github.com> On Tue, 15 Nov 2022 09:46:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> We noticed that NMT tests on our slower PPC machines started failing. >> >> The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. >> >> The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. >> >> --- >> >> This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. >> >> With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). > > Thanks for the closer analysis of execution time. I agree with your proposed solution and deciding against an additional option/flag, given the complexity and the limited benefit. The main purpose and motivation behind JDK-8242181 was to get the additional source information in hs_err files. Thank you @chhagedorn! I also think NMT is special since we print thousands of callstacks, not just one as we do in error reporting. About mentioning JBS issues, not sure. Not sure what the harm would be. Lets ask @dholmes-ora ? ------------- PR: https://git.openjdk.org/jdk/pull/11135 From chagedorn at openjdk.org Tue Nov 15 10:19:55 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Nov 2022 10:19:55 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> References: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> Message-ID: <HI31R9JRc6JkuWSfbSEGyxfbRxBjUsvrqgIZYV7XFwA=.44687b45-54a8-4799-a4b1-be58b6adc271@github.com> On Wed, 26 Oct 2022 16:43:26 GMT, zzambers <duke at openjdk.org> wrote: > Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10875 From stefank at openjdk.org Tue Nov 15 10:42:58 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 10:42:58 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ [v2] In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <advo_9N-gWLIYj0pvZdBMKdX99rRMRKTIHUCybir2tI=.09dbeb3c-a27d-4195-98bb-9923d25dda1d@github.com> > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Remove include/ from includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11133/files - new: https://git.openjdk.org/jdk/pull/11133/files/38707dff..a4a479ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=00-01 Stats: 207 lines in 137 files changed: 69 ins; 69 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/11133.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11133/head:pull/11133 PR: https://git.openjdk.org/jdk/pull/11133 From thartmann at openjdk.org Tue Nov 15 10:56:33 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Nov 2022 10:56:33 GMT Subject: RFR: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value In-Reply-To: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> References: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> Message-ID: <BdI8UaFabbx4sJF1SBGfBp3Jtf271JhHAKO35O8p4zo=.c94fb067-287b-4174-847e-beb007194ce4@github.com> On Mon, 14 Nov 2022 19:37:20 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > This PR fixes a bug related to `HotSpotResolvedJavaFieldImpl.index`. Its value is passed into the `HotSpotResolvedJavaFieldImpl` constructor as an `int`, and is returned by `getIndex()` as an `int` but it was stored as a `short`. This meant that unsigned 16-bit values were not handled correctly. > > Also included are some related JVMCI cleanups: > * added and fixed doc related to `ResolvedJavaField.getOffset()` > * replaced assertions with always-enabled checks Looks good to me otherwise. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 270: > 268: opcode == Bytecodes.INVOKEVIRTUAL || > 269: opcode == Bytecodes.INVOKESPECIAL || > 270: opcode == Bytecodes.INVOKESTATIC) { Indentation looks a bit off. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/11142 From dnsimon at openjdk.org Tue Nov 15 10:56:35 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 15 Nov 2022 10:56:35 GMT Subject: RFR: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value In-Reply-To: <BdI8UaFabbx4sJF1SBGfBp3Jtf271JhHAKO35O8p4zo=.c94fb067-287b-4174-847e-beb007194ce4@github.com> References: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> <BdI8UaFabbx4sJF1SBGfBp3Jtf271JhHAKO35O8p4zo=.c94fb067-287b-4174-847e-beb007194ce4@github.com> Message-ID: <w6GptGr9YBvbeBfcmfX74o9UdianeYc3YN9q0bH0uBc=.8950f5db-6bfe-4798-b214-53c0299e7f0a@github.com> On Tue, 15 Nov 2022 10:50:29 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> This PR fixes a bug related to `HotSpotResolvedJavaFieldImpl.index`. Its value is passed into the `HotSpotResolvedJavaFieldImpl` constructor as an `int`, and is returned by `getIndex()` as an `int` but it was stored as a `short`. This meant that unsigned 16-bit values were not handled correctly. >> >> Also included are some related JVMCI cleanups: >> * added and fixed doc related to `ResolvedJavaField.getOffset()` >> * replaced assertions with always-enabled checks > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 270: > >> 268: opcode == Bytecodes.INVOKEVIRTUAL || >> 269: opcode == Bytecodes.INVOKESPECIAL || >> 270: opcode == Bytecodes.INVOKESTATIC) { > > Indentation looks a bit off. That's due to the Eclipse formatter enforced style adopted across JVMCI and Graal. I can add comments to explicitly disable it here if you want but I'm not sure it's worth it. ------------- PR: https://git.openjdk.org/jdk/pull/11142 From stefank at openjdk.org Tue Nov 15 11:03:08 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:03:08 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v7] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <NUba8hBDFqcSOZNsfvVfP9LSCDPdxBobYdngjQngpk4=.7b4e6bdd-a7c3-4581-a75c-155f15ced3e7@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Review tstuefe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/b2cfd572..7a0ab735 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=05-06 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 11:03:08 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:03:08 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v6] In-Reply-To: <jz2IZEzX0S_aoZcwrDwjlE-m2cqotDY1uCV6KxpTLyY=.12e19469-ea17-4e87-b3c9-2f25ca076a16@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <jz2IZEzX0S_aoZcwrDwjlE-m2cqotDY1uCV6KxpTLyY=.12e19469-ea17-4e87-b3c9-2f25ca076a16@github.com> Message-ID: <2Wp9f9CBP8JAr0Ozzq3_Xi4HqRQWplIhOPdTlK_oMf0=.dda4b9ff-c9ab-4041-ac0c-ea4a19d26beb@github.com> On Tue, 15 Nov 2022 10:17:04 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Move now unnecessary 'explicit' specifier Thanks for looking at this, Thomas. I'm fixing most of your suggestions but left bits() alone, since I like it. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 11:03:09 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:03:09 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v5] In-Reply-To: <b0jZQIDzcu3bB2nJ5U4ZTDnWarS_XDEXl4IGs2XClXI=.9a6ef567-6847-4839-bc44-e1332be63a27@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> <b0jZQIDzcu3bB2nJ5U4ZTDnWarS_XDEXl4IGs2XClXI=.9a6ef567-6847-4839-bc44-e1332be63a27@github.com> Message-ID: <ZRpGVAiXyJn21YgPGFcxrJ1bamnMvu1i5108HGGQ61k=.b706520b-8bd3-4454-9396-bfea46c64887@github.com> On Tue, 15 Nov 2022 10:11:17 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/share/utilities/growableArray.hpp line 613: >> >>> 611: } >>> 612: >>> 613: // Arena allocation >> >> Can be made static? > > Also, possibly assert _arena & 1 == 0 Done ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 11:03:09 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:03:09 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v5] In-Reply-To: <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <YaA64_3jMA4U1AE2flyTiWSgTazCnJnrux8BjnpcdCg=.8f69cc57-1aa1-44cc-a7ca-1ca745c6ca6e@github.com> <XIBSXlkonAH-f14Yvypke_-pWX8hAglsBn8bgI-hOjY=.1c554168-0696-463a-b323-8ce3c54a376e@github.com> Message-ID: <CQUTp8YjXQTlwgbe8B0R0xk1KvjKyFl82leyTwcXGjs=.bd13416c-1bc0-45b4-8f27-873ffdd3e59a@github.com> On Tue, 15 Nov 2022 10:09:27 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Try to fix 32-bit builds > > src/hotspot/share/utilities/growableArray.hpp line 603: > >> 601: >> 602: // Resource allocation >> 603: uintptr_t bits() const { > > Is there still a point to this overload? Can we just init with literal 0 instead? I prefer this style more since it keeps the bit implementations together and it makes the constructors look consistent. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Tue Nov 15 11:08:15 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:08:15 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v8] In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Spelling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11086/files - new: https://git.openjdk.org/jdk/pull/11086/files/7a0ab735..f5d24401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11086&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11086.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11086/head:pull/11086 PR: https://git.openjdk.org/jdk/pull/11086 From stuefe at openjdk.org Tue Nov 15 11:11:32 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 11:11:32 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v8] In-Reply-To: <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> Message-ID: <kyZEaB4v8VUOKBYt04ugVpS7fwHZpjU9ehauVxeJVSI=.e8ee32cf-9837-43f5-a559-5d03b4304192@github.com> On Tue, 15 Nov 2022 11:08:15 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Spelling All good Stefan! I already approved. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11086 From mcimadamore at openjdk.org Tue Nov 15 11:14:35 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 11:14:35 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v20] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <uH-g-J7i6tgekoK2MJkbkul4wi9QTsh6aQIG46heMIQ=.3886b8e6-a605-4d6f-bb78-844c80a73a1e@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Rename MemorySession -> SegmentScope Improve javadoc of SegmentScope/Arena Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/20ee6e8d..5ae5864a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=18-19 Stats: 1298 lines in 125 files changed: 174 ins; 177 del; 947 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 15 11:16:22 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 11:16:22 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v19] In-Reply-To: <43YEgUwCbX4IMeM2AjG_ZAytW-ibfIqCPW1fmBoYDpQ=.e2ef76bd-b10b-4785-976b-974501043f28@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <43YEgUwCbX4IMeM2AjG_ZAytW-ibfIqCPW1fmBoYDpQ=.e2ef76bd-b10b-4785-976b-974501043f28@github.com> Message-ID: <8jsBP6xJ2lT5UEIEHaGfI_Juqtj_pD1Plp7oynz81Zo=.695ba1b6-bfa9-442d-9cf2-425a7ed5a352@github.com> On Tue, 15 Nov 2022 10:12:12 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 49 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Tweak preview feature description for JEP 434 > - Tweak Arena::close javadoc > - Merge pull request #15 from minborg/test > > Add @apiNote to package-info > - Add @apiNote to package-info > - Merge pull request #16 from minborg/fix-tests2 > > Fix failing tests > - Fix failing tests > - Rename isOwnedBy -> isCloseableBy > Fix minor typos > Fix StrLenTest/RingAllocator > - Fix typo > - More javadoc fixes > - ... and 39 more: https://git.openjdk.org/jdk/compare/3ebf94de...20ee6e8d I've renamed `MemorySession` to `SegmentScope`, following some internal and external feedback. I've also greatly improved the javadoc of both `Arena` and `SegmentScope`. A javadoc of the API contained in this iteration can be found here: http://cr.openjdk.java.net/~mcimadamore/jdk/8295044/v3/javadoc/java.base/module-summary.html ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 15 11:19:26 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 11:19:26 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v21] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <I7SjmaES3c_FaGDMQXN3JC-RmlSuA2iU7adIq3jhRzE=.21342938-1f80-40d0-b843-3dd522da34bb@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/5ae5864a..3d9cebde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=19-20 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 15 11:16:52 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 11:16:52 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack Message-ID: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. ------------- Depends on: https://git.openjdk.org/jdk/pull/11086 Commit messages: - Rename GrowableArray::on_stack Changes: https://git.openjdk.org/jdk/pull/11161/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11161&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297020 Stats: 22 lines in 3 files changed: 0 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/11161.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11161/head:pull/11161 PR: https://git.openjdk.org/jdk/pull/11161 From stuefe at openjdk.org Tue Nov 15 11:26:51 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 11:26:51 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack In-Reply-To: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> Message-ID: <f_-u_83sgrBmecuZ3ECJP_IbGkZlATWA8Hu90dFTj3M=.cadb8789-2aa0-406e-9ec1-4382e78ca965@github.com> On Tue, 15 Nov 2022 11:09:56 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. +1 ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11161 From yadongwang at openjdk.org Tue Nov 15 11:40:07 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 15 Nov 2022 11:40:07 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> Message-ID: <8FCs-qei-0JbEuGW9ybJzbjOe83gHJOco1m4ziMa-Tg=.342ddbd7-c405-4fa9-9ee7-ae49797f44c4@github.com> On Mon, 14 Nov 2022 02:58:20 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. >> These are small functions which mostly contain only a single line of code. We should move them to the >> corresponding header file so that they have a chance to be inlined. >> >> Testing: Tier1 on linux-riscv64 HiFive unmatched board. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/11130 From jwaters at openjdk.org Tue Nov 15 12:25:06 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 15 Nov 2022 12:25:06 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <S1wp3kCTkaV5pa7NNeFP57Zrf2JFGs9BNzMC1dYkfXU=.262781f6-e561-42e2-893d-918cf6fc189b@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <x8nDS5oBvPgUmNTQie92RqlCGQpIvXY2Ribuu-YIeg0=.541a91ec-7dda-49b5-a274-6c127d6b1039@github.com> <ZFIm743da2bNDg0P_C6hPwK8ld2QAZSiR1vt3rQJFC0=.1d280d20-15a1-42fb-970b-07e565a320c5@github.com> <7OFGmmeTJLL_dl8LZC0y11-crea_x6bM5Sto5_c366k=.8fda2c5a-b7b7-4c98-9a6e-b6c6cf14db3a@github.com> <S1wp3kCTkaV5pa7NNeFP57Zrf2JFGs9BNzMC1dYkfXU=.262781f6-e561-42e2-893d-918cf6fc189b@github.com> Message-ID: <iSTDY2tJsH0cZdGm-b43DfFGTQN8vKa_hBBHMgjKu_k=.3c3d967c-1bd7-43f1-85a5-aa5a3ece6171@github.com> On Tue, 15 Nov 2022 06:12:52 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Reverted to use the original, less intrusive solution from [8274980](https://github.com/openjdk/jdk/pull/11081/commits/83ed3deb29d7344bbc95a3831f2388d077bc59e9) that initially could not work with the older Visual C++ compiler (With a minor improvement to handle #define 0) > > Sorry, but I don't think that's much better than the prior version, and still doesn't seem better than the current code. What problem is this change supposed to be solving? I didn't find any open bugs that seemed relevant (could be I just didn't recognize such). Whatever it is seems likely to be unrelated to any of the other changes in this PR; it would have been / would be better to deal with it separately. It's related to the JEP-223 version build numbers, which the original commit couldn't get to work due to a preprocessor trick not working with the now unsupported 2017. I just felt it could be included here out of convenience, but I guess I can split it off into another change if required ------------- PR: https://git.openjdk.org/jdk/pull/11081 From mbaesken at openjdk.org Tue Nov 15 12:26:02 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 15 Nov 2022 12:26:02 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <8I1jJI5BNTHGrdHRVwvGKphjM0ll1IQTdSwYne8fj0U=.18fcc929-6d39-42f5-a719-51d0d1a34ce8@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). LGTM ------------- Marked as reviewed by mbaesken (Reviewer). PR: https://git.openjdk.org/jdk/pull/11135 From jbhateja at openjdk.org Tue Nov 15 12:26:59 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Nov 2022 12:26:59 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <4VgZ82kW_Fc5dwN2IimRW7StzUF8tWaJjDq4hRrhUoI=.e943ec6d-c3db-4655-8744-39c858767b45@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <4VgZ82kW_Fc5dwN2IimRW7StzUF8tWaJjDq4hRrhUoI=.e943ec6d-c3db-4655-8744-39c858767b45@github.com> Message-ID: <iLXeAaY0S0zLcoAcHBUy6YuuS2wY3zOlW_X97NW6KSg=.53849487-ab38-4ad4-8112-3cc70c5699a0@github.com> On Mon, 14 Nov 2022 23:16:42 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > @sviswa7 or @jatin-bhateja do you agree with these changes? Patch shows significant improvement and better port utilization with 3+ micro ops on CLX. JDK-With-opt: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 2 5613.517 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 2 50.026 ops/ms 43,24,11,23,563 exe_activity.1_ports_util (79.97%) 54,01,28,04,330 exe_activity.2_ports_util (80.22%) 25,20,63,64,512 exe_activity.3_ports_util (80.00%) 6,42,47,64,948 exe_activity.4_ports_util (79.83%) JDK-baseline: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 2 4087.112 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 2 35.291 ops/ms 50,76,35,89,853 exe_activity.1_ports_util (80.09%) 36,59,68,98,931 exe_activity.2_ports_util (79.89%) 9,61,69,23,581 exe_activity.3_ports_util (80.02%) 1,88,94,94,202 exe_activity.4_ports_util (79.98%) ------------- PR: https://git.openjdk.org/jdk/pull/11054 From mbaesken at openjdk.org Tue Nov 15 12:33:07 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 15 Nov 2022 12:33:07 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v2] In-Reply-To: <925asUQVBWNrlwBiUMVJLWMFpPsKeAG40UYd8hI90pc=.364ae2b4-f7fb-437b-92d3-6825cb5e8879@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <925asUQVBWNrlwBiUMVJLWMFpPsKeAG40UYd8hI90pc=.364ae2b4-f7fb-437b-92d3-6825cb5e8879@github.com> Message-ID: <YErSRxTVyCXUf6BfQUJVfKPQ5cLmmAx3MUes0pZJYwA=.237b38dc-2781-49f2-8725-3cd7c2190699@github.com> On Mon, 14 Nov 2022 07:10:39 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. >> >> The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback david Looks good to me. Probably you should adjust the SAP copyright header line in [src/hotspot/share/utilities/vmError.hpp](https://github.com/openjdk/jdk/pull/11122/files#diff-d01eccc98068519f691bbabbba6c192eb576d7c60a4e06567db0768141787f35) to 2022 . ------------- Marked as reviewed by mbaesken (Reviewer). PR: https://git.openjdk.org/jdk/pull/11122 From jwaters at openjdk.org Tue Nov 15 12:34:04 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 15 Nov 2022 12:34:04 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <PqiJunKhY85BbqgIB5X96kg16Dm-mTszPAFzYOjsfWs=.ebb72cc2-e4e0-4e8f-823e-1f5f97881a6a@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <cvaF0K8j70TF2vlmgBHzQYAOiYSeP7uZ_Y4yavC0J2w=.e69dd78a-7b52-4a00-969e-e4a195623c2b@github.com> <PqiJunKhY85BbqgIB5X96kg16Dm-mTszPAFzYOjsfWs=.ebb72cc2-e4e0-4e8f-823e-1f5f97881a6a@github.com> Message-ID: <cRXeR6mdu1s4Vy4P-1TDPTCdnso0soubbR2tSeVvYiY=.3d543774-2096-4b96-89e7-50aec40fa96b@github.com> On Mon, 14 Nov 2022 13:06:28 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Yep, just something that C++ does a little neater, at least in my view > > We (the HotSpot Group) have not discussed or approved the use of the new C++ attribute syntax, whether for standard attributes or compiler-specific ones. That involves an update to the Style Guide. I'm not convinced switching existing uses from compiler-specific `__attribute__` syntax to compiler-specific `[[attribute]]` syntax is worth the substantial code churn. I was under the assumption that anything not listed under "Not discussed" or "Forbidden" was free game in HotSpot, I apologise for my goof since that isn't the case. That said, I am curious if there are any isolated conversations about the attribute syntax anywhere ------------- PR: https://git.openjdk.org/jdk/pull/11081 From mcimadamore at openjdk.org Tue Nov 15 12:34:43 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 12:34:43 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Add `since` tag in Module/ModuleLayer preview methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/3d9cebde..b2dd8926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=20-21 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From stuefe at openjdk.org Tue Nov 15 12:49:02 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 12:49:02 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <8I1jJI5BNTHGrdHRVwvGKphjM0ll1IQTdSwYne8fj0U=.18fcc929-6d39-42f5-a719-51d0d1a34ce8@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> <8I1jJI5BNTHGrdHRVwvGKphjM0ll1IQTdSwYne8fj0U=.18fcc929-6d39-42f5-a719-51d0d1a34ce8@github.com> Message-ID: <gvEkeCNsSFyS1yuvtorU4AMF5KEh1_Iwr_pR9SeeqiI=.6b17daa4-0ba7-407e-9464-e0d2de2139b2@github.com> On Tue, 15 Nov 2022 12:23:58 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > LGTM Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/11135 From redestad at openjdk.org Tue Nov 15 12:51:10 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 15 Nov 2022 12:51:10 GMT Subject: RFR: 8296429: Remove os::supports_sse Message-ID: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. ------------- Commit messages: - Remove os::supports_sse Changes: https://git.openjdk.org/jdk/pull/11164/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11164&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296429 Stats: 37 lines in 6 files changed: 0 ins; 37 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11164.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11164/head:pull/11164 PR: https://git.openjdk.org/jdk/pull/11164 From stuefe at openjdk.org Tue Nov 15 12:52:57 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 12:52:57 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v3] In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <W6A7cfHKo0SZoEVlXxUDutY4t10ufyOxH81t_d8-9ig=.49d9813d-5f28-4986-8ffa-eb2fe1027d8a@github.com> > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11122/files - new: https://git.openjdk.org/jdk/pull/11122/files/33b758ca..a931ea8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11122.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11122/head:pull/11122 PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Tue Nov 15 12:52:58 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 12:52:58 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v2] In-Reply-To: <YErSRxTVyCXUf6BfQUJVfKPQ5cLmmAx3MUes0pZJYwA=.237b38dc-2781-49f2-8725-3cd7c2190699@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <925asUQVBWNrlwBiUMVJLWMFpPsKeAG40UYd8hI90pc=.364ae2b4-f7fb-437b-92d3-6825cb5e8879@github.com> <YErSRxTVyCXUf6BfQUJVfKPQ5cLmmAx3MUes0pZJYwA=.237b38dc-2781-49f2-8725-3cd7c2190699@github.com> Message-ID: <yxHmdIbY8vL26fmZKrZzvMbob_cOxHZ-3v9R-wd7ncQ=.64cf6d4f-faa0-4e6d-ba43-f9340c80d0df@github.com> On Tue, 15 Nov 2022 12:31:03 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote: > Looks good to me. Probably you should adjust the SAP copyright header line in [src/hotspot/share/utilities/vmError.hpp](https://github.com/openjdk/jdk/pull/11122/files#diff-d01eccc98068519f691bbabbba6c192eb576d7c60a4e06567db0768141787f35) to 2022 . Thank you @MBaesken. Copyright fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11122 From duke at openjdk.org Tue Nov 15 12:56:00 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 15 Nov 2022 12:56:00 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <bng2BOi-1VU_th3MM8vVC5YAzaPo3DXCaZH7AlLSILo=.29da7408-6893-4f32-bdfa-46e678c8e84e@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Performance without the optimization on Cascade Lake: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 15 3315.328 ? 65.799 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 27.482 ? 0.006 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2916.207 ? 127.293 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 27.381 ? 0.003 ops/ms Performance with optimization on Cascade Lake: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 15 4474.780 ? 17.583 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 38.926 ? 0.005 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3796.684 ? 153.887 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 38.724 ? 0.005 ops/ms ------------- PR: https://git.openjdk.org/jdk/pull/11054 From stuefe at openjdk.org Tue Nov 15 13:08:59 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 13:08:59 GMT Subject: RFR: 8296429: Remove os::supports_sse In-Reply-To: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> Message-ID: <I-IJQQkOSt4gv6LJXkpgAijXnE_HLk9j-Zms27EtbJk=.fe87caa9-7079-4666-a47e-d653087f8a44@github.com> On Tue, 15 Nov 2022 12:42:48 GMT, Claes Redestad <redestad at openjdk.org> wrote: > os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. This was only relevant on 32-bit anyway right? Looks good. 2.4. was released in 2001, I think its safe to remove. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11164 From duke at openjdk.org Tue Nov 15 13:53:57 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 15 Nov 2022 13:53:57 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <atQETPC7Hy8tQCfM6Yg1pYYM0fSHBki1-HdvR9JV7bA=.e61109b4-4007-4468-9be6-310ea527c328@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Performance without the optimization on Ice Lake: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 15 5402.018 ? 17.033 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.722 ? 0.003 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4652.620 ? 35.432 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.573 ? 0.016 ops/ms Performance with optimization on Ice Lake: Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 15 5348.594 ? 14.303 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.671 ? 0.008 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4583.530 ? 12.752 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.545 ? 0.006 ops/ms ------------- PR: https://git.openjdk.org/jdk/pull/11054 From duke at openjdk.org Tue Nov 15 13:56:06 2022 From: duke at openjdk.org (zzambers) Date: Tue, 15 Nov 2022 13:56:06 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: <HI31R9JRc6JkuWSfbSEGyxfbRxBjUsvrqgIZYV7XFwA=.44687b45-54a8-4799-a4b1-be58b6adc271@github.com> References: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> <HI31R9JRc6JkuWSfbSEGyxfbRxBjUsvrqgIZYV7XFwA=.44687b45-54a8-4799-a4b1-be58b6adc271@github.com> Message-ID: <4pDysZ5rfejMZTjsuofGKhBiZ6GftT5CwyeW2eWWj14=.8bf56418-43b9-4ebe-af4f-e9bb3e73614b@github.com> On Tue, 15 Nov 2022 10:16:22 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. > > Looks good and trivial! @chhagedorn Thanks for review ------------- PR: https://git.openjdk.org/jdk/pull/10875 From redestad at openjdk.org Tue Nov 15 14:04:56 2022 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 15 Nov 2022 14:04:56 GMT Subject: RFR: 8296429: Remove os::supports_sse In-Reply-To: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> Message-ID: <OV0LJr53w4ZfmeKhies1vH6N_TSJp4Mze--1zbazf48=.6c722aa8-ebd2-47ee-bf80-975c9c5b534a@github.com> On Tue, 15 Nov 2022 12:42:48 GMT, Claes Redestad <redestad at openjdk.org> wrote: > os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. Correct, the check for kernel version only happens on 32-bit linux. ------------- PR: https://git.openjdk.org/jdk/pull/11164 From stuefe at openjdk.org Tue Nov 15 14:11:29 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 15 Nov 2022 14:11:29 GMT Subject: RFR: JDK-8296995: ostream should handle snprintf(3) errors in release builds Message-ID: <lXBQPLCrniDiyp9OAfVdsHq3xVNsxxTWymy_1cSaiHg=.0f341fe2-7bcc-4640-bf2e-0f1883ee22e4@github.com> Small fix. All streams in ostream.hpp end up using `outputStream::do_vsnprintf()`, which uses `os::snprintf()`, which uses `::vsnprintf()`. The latter can fail, returning -1, e.g. in case of an encoding error. In that case, we assert in debug. In release builds this situation gets misdiagnosed as a buffer overflow because we cast the signedness of the result away and compare it with the output buffer length (see `outputStream::do_vsnprintf()`). The output buffer will be zero-terminated at its end by `os::snprintf()`, but that leaves the rest of the output buffer undefined. The libc may or may not have written parts of the formatted output into it, and may or may not have zero-terminated it. We then proceed to write whatever happens to be in that buffer to the stream sink (see `outputStream::do_vsnprintf_and_write_with_automatic_buffer()` resp. `outputStream::do_vsnprintf_and_write_with_scratch_buffer()`). --- Patch fixes this : in release builds, we now write nothing. A fatal error would be not good here, since I am not sure this cannot be produced via user input. I considered printing a clear marker, e.g. "ENCODING ERROR" instead, and I'm open to suggestions. ------------- Commit messages: - JDK-8296995-ostream-handle-sprintf-errors Changes: https://git.openjdk.org/jdk/pull/11160/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11160&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296995 Stats: 27 lines in 2 files changed: 27 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11160.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11160/head:pull/11160 PR: https://git.openjdk.org/jdk/pull/11160 From coleenp at openjdk.org Tue Nov 15 14:15:02 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 14:15:02 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> Message-ID: <iF4tdByM8y7NzhFfm2dbJ38l39Ld8CZIPdt3zC7w_d4=.eda73685-fe4a-41cf-b39f-d8134c6c1f9c@github.com> On Mon, 14 Nov 2022 16:12:48 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> After [JDK-8292008](https://bugs.openjdk.org/browse/JDK-8292008) and [JDK-8247283](https://bugs.openjdk.org/browse/JDK-8247283), some C and C++ code across the JDK can be replaced and simplified with cleaner language features that were previously not available due to required compatibility with the now unsupported Visual C++ 2017 compiler. These cleanups were highlighted by the very briefly integrated 8296115 >> >> No changes to the behaviour of the JDK has resulted in any way from this commit > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to using simpler solution similar to the original 8274980 Can you have separate PRs for each type of thing you're changing? ------------- PR: https://git.openjdk.org/jdk/pull/11081 From aph at openjdk.org Tue Nov 15 14:20:43 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:43 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) Message-ID: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> JEP 429 implementation. ------------- Commit messages: - Update StressStackOverflow - Release _scopedValueCache after use - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Update src/java.base/share/classes/jdk/internal/vm/ScopedValueContainer.java - Rename test - Update - Whitespace - Rewrite ScopedValue stress test - Reviewer feedback - Reviewer feedback - ... and 10 more: https://git.openjdk.org/jdk/compare/d4d183ed...0d72ca2f Changes: https://git.openjdk.org/jdk/pull/10952/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286666 Stats: 3272 lines in 52 files changed: 2868 ins; 246 del; 158 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 14:20:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:48 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <ZZdAKZ9bIbjt0_jcpXkqAsHbFAjmp6cK8pPC8Y9bks4=.6f1b1a45-8434-4275-97e9-8d02bfb828bf@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <ZZdAKZ9bIbjt0_jcpXkqAsHbFAjmp6cK8pPC8Y9bks4=.6f1b1a45-8434-4275-97e9-8d02bfb828bf@github.com> Message-ID: <dkd9EyTzZJXTs_6-x0NF6VpAhP9CXyi2o9fla7WcOJM=.3cc41e07-6f17-4d9d-ad98-e372a0c421bb@github.com> On Fri, 4 Nov 2022 23:17:32 GMT, Dean Long <dlong at openjdk.org> wrote: >> JEP 429 implementation. > > src/hotspot/share/prims/jvm.cpp line 1410: > >> 1408: loc = 3; >> 1409: } else if (method == resolver.thread_run_method) { >> 1410: loc = 2; > > This depends on how javac numbers locals, right? It seems a bit fragile. This is one of the reasons why doPrivileged uses the helper method executePrivileged, so the locals are arguments, giving them predictable offsets. Ah, good point. I'll have a look at doing that. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From dlong at openjdk.org Tue Nov 15 14:20:47 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 15 Nov 2022 14:20:47 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <ZZdAKZ9bIbjt0_jcpXkqAsHbFAjmp6cK8pPC8Y9bks4=.6f1b1a45-8434-4275-97e9-8d02bfb828bf@github.com> On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley <aph at openjdk.org> wrote: > JEP 429 implementation. src/hotspot/share/prims/jvm.cpp line 1410: > 1408: loc = 3; > 1409: } else if (method == resolver.thread_run_method) { > 1410: loc = 2; This depends on how javac numbers locals, right? It seems a bit fragile. This is one of the reasons why doPrivileged uses the helper method executePrivileged, so the locals are arguments, giving them predictable offsets. src/java.base/share/classes/jdk/internal/vm/ScopedValueContainer.java line 53: > 51: /** > 52: * Returns the "latest" ScopedValueContainer for the current Thread. This may be on > 53: * the current thread's scope task or ma require walking up the tree to find it. Suggestion: * the current thread's scope task or may require walking up the tree to find it. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 14:20:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:48 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <dkd9EyTzZJXTs_6-x0NF6VpAhP9CXyi2o9fla7WcOJM=.3cc41e07-6f17-4d9d-ad98-e372a0c421bb@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <ZZdAKZ9bIbjt0_jcpXkqAsHbFAjmp6cK8pPC8Y9bks4=.6f1b1a45-8434-4275-97e9-8d02bfb828bf@github.com> <dkd9EyTzZJXTs_6-x0NF6VpAhP9CXyi2o9fla7WcOJM=.3cc41e07-6f17-4d9d-ad98-e372a0c421bb@github.com> Message-ID: <buDIXmfTPJUOV6ITOJzStRgl2rTMugKZov6HCrNSHgY=.e8cdf866-d68e-4edd-922e-a7c776019ff0@github.com> On Thu, 10 Nov 2022 17:42:38 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/prims/jvm.cpp line 1410: >> >>> 1408: loc = 3; >>> 1409: } else if (method == resolver.thread_run_method) { >>> 1410: loc = 2; >> >> This depends on how javac numbers locals, right? It seems a bit fragile. This is one of the reasons why doPrivileged uses the helper method executePrivileged, so the locals are arguments, giving them predictable offsets. > > Ah, good point. I'll have a look at doing that. Done. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Tue Nov 15 14:20:50 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 15 Nov 2022 14:20:50 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <2qiwIUFK4kh7KzfhV8rlx9lXY79B_LkUvFo3E9hKDyM=.682039e2-78be-4864-a90a-3bd6e5372547@github.com> On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley <aph at openjdk.org> wrote: > JEP 429 implementation. src/hotspot/share/prims/jvm.cpp line 4072: > 4070: */ > 4071: JVM_ENTRY(void, JVM_EnsureMaterializedForStackWalk_func(JNIEnv* env, jobject vthread, jobject value)) > 4072: //asm("nop"); The asm("nop") was commented out to get the build working. Its inserted by the compiler now so I assume the commented now asm can be removed. src/java.base/share/classes/java/lang/VirtualThread.java line 318: > 316: } > 317: } > 318: @Hidden Can we rename this to runWith(Runnable, Object) in both Thread and VirtualThread to keep the naming consistent if we can? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Tue Nov 15 14:20:53 2022 From: duke at openjdk.org (ExE Boss) Date: Tue, 15 Nov 2022 14:20:53 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley <aph at openjdk.org> wrote: > JEP 429 implementation. src/java.base/share/classes/java/lang/Thread.java line 1610: > 1608: ensureMaterializedForStackWalk(bindings); > 1609: task.run(); > 1610: Reference.reachabilityFence(bindings); This?should probably?be?in a?`try`?`finally`?block: Suggestion: try { task.run(); } finally { Reference.reachabilityFence(bindings); } src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 481: > 479: } > 480: */ > 481: return findBinding() != Snapshot.NIL; This?should probably?call `Cache.put(this,?value)` when?`findBinding()` isn?t?`Snapshot.NIL`, since?it?s?likely that?`isBound()` will?most?commonly be?used in?the?form?of: if (SCOPED_VALUE.isBound()) { final var value = SCOPED_VALUE.get(); // do something with `value` } -------------------------------------------------------------------------------- Suggestion: var value = findBinding(); if (value == Snapshot.NIL) { return false; } Cache.put(this, value); return true; ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 14:20:53 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:53 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> Message-ID: <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> On Thu, 3 Nov 2022 11:50:17 GMT, ExE Boss <duke at openjdk.org> wrote: >> JEP 429 implementation. > > src/java.base/share/classes/java/lang/Thread.java line 1610: > >> 1608: ensureMaterializedForStackWalk(bindings); >> 1609: task.run(); >> 1610: Reference.reachabilityFence(bindings); > > This?should probably?be?in a?`try`?`finally`?block: > Suggestion: > > try { > task.run(); > } finally { > Reference.reachabilityFence(bindings); > } I wonder. The pattern I'm using here is based on `AccessController.executePrivileged`, which doesn't have the `finally` clause. Perhaps I should add one here anyway. > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 481: > >> 479: } >> 480: */ >> 481: return findBinding() != Snapshot.NIL; > > This?should probably?call `Cache.put(this,?value)` when?`findBinding()` isn?t?`Snapshot.NIL`, since?it?s?likely that?`isBound()` will?most?commonly be?used in?the?form?of: > > if (SCOPED_VALUE.isBound()) { > final var value = SCOPED_VALUE.get(); > // do something with `value` > } > > > -------------------------------------------------------------------------------- > > Suggestion: > > var value = findBinding(); > if (value == Snapshot.NIL) { > return false; > } > Cache.put(this, value); > return true; Probably so, yes. I'll have a look at that along with caching failure. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 14:20:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:56 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <3lVtaG-K0g_3M_-8_lszieivuJ4a88CKmyWj0MH8cgU=.fc28bd92-72ec-4eff-962f-06c7404954fd@github.com> On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley <aph at openjdk.org> wrote: > JEP 429 implementation. src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 475: > 473: // ??? Do we want to search cache for this? In most cases we don't expect > 474: // this {@link ScopedValue} to be bound, so it's not worth it. But I may > 475: // be wrong about that. We should make it switchable by a runtime parameter. We should optionally cache failures. Add a microbenchmark for that. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From dlong at openjdk.org Tue Nov 15 14:20:54 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 15 Nov 2022 14:20:54 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> Message-ID: <kXgHNrHUMWBv50mRCkzuOOT9wNCoNzpSHGmaCmP5gGE=.27f2e129-b410-4a74-a14b-c41ea260f72e@github.com> On Fri, 4 Nov 2022 09:53:39 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/Thread.java line 1610: >> >>> 1608: ensureMaterializedForStackWalk(bindings); >>> 1609: task.run(); >>> 1610: Reference.reachabilityFence(bindings); >> >> This?should probably?be?in a?`try`?`finally`?block: >> Suggestion: >> >> try { >> task.run(); >> } finally { >> Reference.reachabilityFence(bindings); >> } > > I wonder. The pattern I'm using here is based on `AccessController.executePrivileged`, which doesn't have the `finally` clause. Perhaps I should add one here anyway. I hope it doesn't matter. There is an example in the reachabilityFence javadocs where it does not use finally. For it to matter, I think the compiler would need to inline through run() and prove that it can throw an exception, but I don't think that's how the JIT compilers currently implement reachabilityFence. I suppose a finally shouldn't hurt, however. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 14:20:55 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:20:55 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) In-Reply-To: <2qiwIUFK4kh7KzfhV8rlx9lXY79B_LkUvFo3E9hKDyM=.682039e2-78be-4864-a90a-3bd6e5372547@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <2qiwIUFK4kh7KzfhV8rlx9lXY79B_LkUvFo3E9hKDyM=.682039e2-78be-4864-a90a-3bd6e5372547@github.com> Message-ID: <Yj81L0TMKWcS5VBcJfGyzQQVfAr0mtM10YMyfQJp2WY=.a37e4cff-c9cc-47f2-ad49-661151cc27e0@github.com> On Mon, 14 Nov 2022 17:34:31 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> JEP 429 implementation. > > src/java.base/share/classes/java/lang/VirtualThread.java line 318: > >> 316: } >> 317: } >> 318: @Hidden > > Can we rename this to runWith(Runnable, Object) in both Thread and VirtualThread to keep the naming consistent if we can? OK. I do want to keep the name of this method consistent throughout. There are version for `Runnable` and `Callable` and it makes the runtime code cleaner if they have the same name. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From shade at openjdk.org Tue Nov 15 14:26:43 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Nov 2022 14:26:43 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v6] In-Reply-To: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> Message-ID: <YFUqNAFHWeIeZjced_rpvGUTPuEr0vnWkSgcyyB7kL4=.c89e3fb4-5122-4ed3-88c2-7215ce3a6620@github.com> > After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Fix build failures - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Also disable warnings in gtests - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10493&range=05 Stats: 46 lines in 9 files changed: 6 ins; 1 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/10493.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10493/head:pull/10493 PR: https://git.openjdk.org/jdk/pull/10493 From erikj at openjdk.org Tue Nov 15 14:29:21 2022 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 15 Nov 2022 14:29:21 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ [v2] In-Reply-To: <advo_9N-gWLIYj0pvZdBMKdX99rRMRKTIHUCybir2tI=.09dbeb3c-a27d-4195-98bb-9923d25dda1d@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> <advo_9N-gWLIYj0pvZdBMKdX99rRMRKTIHUCybir2tI=.09dbeb3c-a27d-4195-98bb-9923d25dda1d@github.com> Message-ID: <TNg63dXWqSlUgpHB5xDsocZHPUOBF4SMYXte1BNR70g=.340a38f8-f45a-4e02-9ea2-954c0fb8434f@github.com> On Tue, 15 Nov 2022 10:42:58 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpo t. >> >> This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Remove include/ from includes Build change looks good. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.org/jdk/pull/11133 From coleenp at openjdk.org Tue Nov 15 14:40:58 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 14:40:58 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack In-Reply-To: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> Message-ID: <GFGFPYs9QEcgyCbEWlX4Mzb8FhlGzf6TO_7Kr8LPzg4=.e5d91032-0f05-4c22-a209-9e6d0708a578@github.com> On Tue, 15 Nov 2022 11:09:56 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. Yes, looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/11161 From pminborg at openjdk.org Tue Nov 15 14:43:08 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 14:43:08 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <UdYyCJV-_meAqSHaXCtWLJUKBekOlGf1aayXSOanXvE=.2f160547-8912-4194-9f2d-1010a3409985@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 32: > 30: > 31: /** > 32: * An arena controls the lifecycle of one or more memory segments, providing both flexible allocation and timely deallocation. Strictly: "An arena controls the lifecycle of zero or more ...". A newly created Arena, for example, does not control the lifecycle of any segment. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From pminborg at openjdk.org Tue Nov 15 14:48:01 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 14:48:01 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <213057Saw0-m7uFTwDAgWYorxtjExq17nhZ0ULRUWGk=.1a10e475-baf5-4c71-a5af-c6288d3db6cc@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 35: > 33: * <p> > 34: * An arena has a {@linkplain #scope() scope}, called the arena scope. When the arena is {@linkplain #close() closed}, > 35: * the arena scope becomes not {@linkplain SegmentScope#isAlive() alive}. As a result, all the Suggest "the arena scope is no longer {@linkplain SegmentScope#isAlive() alive}" instead. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From pminborg at openjdk.org Tue Nov 15 14:52:05 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 14:52:05 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <A58FkB5AN2EXrL-ULzZNkWLrVVCR-bddo8Cj7vwmoPo=.cb1ca360-69e3-46fe-9213-747011b24fdb@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 63: > 61: * <em>after</em> the arena has been closed. The cost of providing this guarantee varies based on the > 62: * number of threads that have access to the memory segments allocated by the arena. For instance, if an arena > 63: * is always created and closed by one thread, and the memory segments associated with the arena's scope are always Strictly, if a shared segment is created and is only accessed by a single thread, then we need to track thread usage in order to trivially ensure safety. I think we could reword so that if access is only *allowed* by a single thread, it is trivial. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From pminborg at openjdk.org Tue Nov 15 14:55:14 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 14:55:14 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <A58FkB5AN2EXrL-ULzZNkWLrVVCR-bddo8Cj7vwmoPo=.cb1ca360-69e3-46fe-9213-747011b24fdb@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> <A58FkB5AN2EXrL-ULzZNkWLrVVCR-bddo8Cj7vwmoPo=.cb1ca360-69e3-46fe-9213-747011b24fdb@github.com> Message-ID: <pgAYr6yivaGfYQqw5oZsMEphXH7AL6mYGwsIaq6VRIE=.b7d60942-0f07-4b4f-887d-6c1651488e8d@github.com> On Tue, 15 Nov 2022 14:49:30 GMT, Per Minborg <pminborg at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add `since` tag in Module/ModuleLayer preview methods > > src/java.base/share/classes/java/lang/foreign/Arena.java line 63: > >> 61: * <em>after</em> the arena has been closed. The cost of providing this guarantee varies based on the >> 62: * number of threads that have access to the memory segments allocated by the arena. For instance, if an arena >> 63: * is always created and closed by one thread, and the memory segments associated with the arena's scope are always > > ~~Strictly, if a shared segment is created and is only accessed by a single thread, then we need to track thread usage in order to trivially ensure safety. I think we could reword so that if access is only *allowed* by a single thread, it is trivial.~~ ok. So reading on the initial text makes sense. So, my comment above should be disregarded. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From jnimeh at openjdk.org Tue Nov 15 14:57:30 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 15 Nov 2022 14:57:30 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <bfvITmE9Ikr80Nagm_5JQuvxK7QoU_sXFGYgnMLfZgQ=.87249f91-ca3f-4529-a1a4-cb642480b0da@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <bfvITmE9Ikr80Nagm_5JQuvxK7QoU_sXFGYgnMLfZgQ=.87249f91-ca3f-4529-a1a4-cb642480b0da@github.com> Message-ID: <oETF54Wm1GiGgxVPDTzdA9cPIXez9Lgu7pq7rif8aHw=.82267473-4f18-433c-942e-dab9dc77dab6@github.com> On Tue, 1 Nov 2022 18:38:21 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: >> >> replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations > > src/hotspot/share/opto/library_call.cpp line 6913: > >> 6911: Node* cc20Blk = make_runtime_call(RC_LEAF|RC_NO_FP, >> 6912: OptoRuntime::chacha20Block_Type(), >> 6913: stubAddr, stubName, TypePtr::BOTTOM, > > BTW it can be further improved: the stub reads from `int[]` and writes into `byte[]` while `TypePtr::BOTTOM` signals both in and out memory state is wide. `GraphKit::make_runtime_call()` doesn't support it yet, but if you pass input and output address types separately, it should be possible to turn both into narrow memory and represent the runtime call accordingly (see `wide_in`/`wide_out`-related code in `GraphKit::make_runtime_call()`). Also, it can be done as a follow-up enhancement later. I think I'd like to handle this as a follow-on enhancement. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From aph at openjdk.org Tue Nov 15 14:57:35 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 14:57:35 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v2] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <8Yik6EmjgjWQSz-lAL99OFhis_ZpNTP3IGYyebnggyM=.67de9502-136c-4a17-b6fe-9cb89b72c37a@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Update test - Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/0d72ca2f..4bd44d66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From pminborg at openjdk.org Tue Nov 15 15:07:04 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:07:04 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <E4F8uEhS_Dlj_o8mY5HFtYp_mMRCIutC47dNOPxHWeo=.2db3c5cb-12d7-4a39-9a36-e619bfe3d95e@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 100: > 98: * MemorySegment.allocateNative(bytesSize, byteAlignment, scope()); > 99: *} > 100: * More generally implementations of this method must return a native method featuring the requested size, ... must return a native ~~method~~*segment* featuring ... ------------- PR: https://git.openjdk.org/jdk/pull/10872 From pminborg at openjdk.org Tue Nov 15 15:11:15 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:11:15 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <7T_4KxIMV_s9h3OjbniZId7ridKlYeJ1sXu3tgBor2c=.9e0d8f95-554b-4dc3-8b1b-27e78f85578d@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 89: > 87: > 88: /** > 89: * Returns a native memory segment with the given size (in bytes) and alignment constraint (in bytes). It is noted that the current documentation does not require a **new** native memory segment to be returned. Would it not be better with: Creates a new native memory segment ... The new shared segment might share actual backing memory though. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From alanb at openjdk.org Tue Nov 15 15:06:05 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 15 Nov 2022 15:06:05 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v2] In-Reply-To: <Yj81L0TMKWcS5VBcJfGyzQQVfAr0mtM10YMyfQJp2WY=.a37e4cff-c9cc-47f2-ad49-661151cc27e0@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <2qiwIUFK4kh7KzfhV8rlx9lXY79B_LkUvFo3E9hKDyM=.682039e2-78be-4864-a90a-3bd6e5372547@github.com> <Yj81L0TMKWcS5VBcJfGyzQQVfAr0mtM10YMyfQJp2WY=.a37e4cff-c9cc-47f2-ad49-661151cc27e0@github.com> Message-ID: <p0meBnMWhQHcgk4Zxz9jbZThPkQNBFNC9mBA54mEqCc=.9f60a7d8-149d-4372-83a7-8166c4b2e3f1@github.com> On Tue, 15 Nov 2022 14:15:26 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 318: >> >>> 316: } >>> 317: } >>> 318: @Hidden >> >> Can we rename this to runWith(Runnable, Object) in both Thread and VirtualThread to keep the naming consistent if we can? > > OK. I do want to keep the name of this method consistent throughout. There are version for `Runnable` and `Callable` and it makes the runtime code cleaner if they have the same name. The other thing here is that there two calls to reachabilityFence, it shouldn't need both. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From pminborg at openjdk.org Tue Nov 15 15:15:45 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:15:45 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <NH1uMbxG-JN-dA5cVca_NycjgMIfSVcTVnABKROOGuQ=.bb383e54-90ab-4207-915a-11e11db80b27@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 119: > 117: > 118: /** > 119: * {@return the arena scope} Add a period ('.') after the closing curly bracket. src/java.base/share/classes/java/lang/foreign/Arena.java line 124: > 122: > 123: /** > 124: * Closes this arena. If this method completes normally, the arena scope becomes not {@linkplain SegmentScope#isAlive() alive}, See comment above "not alive" -> "is no longer alive" ------------- PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 15 15:18:02 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 15:18:02 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ [v2] In-Reply-To: <TNg63dXWqSlUgpHB5xDsocZHPUOBF4SMYXte1BNR70g=.340a38f8-f45a-4e02-9ea2-954c0fb8434f@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> <advo_9N-gWLIYj0pvZdBMKdX99rRMRKTIHUCybir2tI=.09dbeb3c-a27d-4195-98bb-9923d25dda1d@github.com> <TNg63dXWqSlUgpHB5xDsocZHPUOBF4SMYXte1BNR70g=.340a38f8-f45a-4e02-9ea2-954c0fb8434f@github.com> Message-ID: <-ok_diV5QiOvTWi2l0uOgCNcev-kAwdqllyekkQWbFg=.faec8a60-67be-4c10-8759-5e1ffe9af0bc@github.com> On Tue, 15 Nov 2022 14:25:15 GMT, Erik Joelsson <erikj at openjdk.org> wrote: > Build change looks good. Thanks, Erik. Actually, that change should have been reverted with the last change. I'll revert that. ------------- PR: https://git.openjdk.org/jdk/pull/11133 From pminborg at openjdk.org Tue Nov 15 15:00:08 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:00:08 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <GIKXOkQmr1nrn7_PkKpuK53NXxgvLUQZtUXAp8Ba0Y4=.df3ebc05-0ce9-429b-9bc9-420ae3ce93d9@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 79: > 77: * <p> > 78: * Shared arenas, on the other hand, have no owner thread. The segments created by a shared arena > 79: * can be {@linkplain SegmentScope#isAccessibleBy(Thread) accessed} by multiple threads. This might be useful when Suggest "can be {@linkplain SegmentScope#isAccessibleBy(Thread) accessed} by ~~multiple~~ *any* thread" ------------- PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Tue Nov 15 15:21:03 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 15 Nov 2022 15:21:03 GMT Subject: RFR: 8296926: Use proper include lines for files in include/ [v3] In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <9gZ-d-amnILbr_KlmSakclFAuAwxwYxXi9Mg8AlefI0=.9d8c75b1-212d-40ce-a689-8352ceb802c8@github.com> > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Revert make file changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11133/files - new: https://git.openjdk.org/jdk/pull/11133/files/a4a479ed..92cba2ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11133.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11133/head:pull/11133 PR: https://git.openjdk.org/jdk/pull/11133 From pminborg at openjdk.org Tue Nov 15 15:34:17 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:34:17 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <rolvXeYI5PDmoX-Q5l7obDmasgTF9Dzk-dOMmMO80ug=.3423895e-bf7a-427e-becb-b76479f6c72f@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 163: > 161: * segment is derived from the address of the original segment, by adding an offset (expressed in bytes). The size of > 162: * the sliced segment is either derived implicitly (by subtracting the specified offset from the size of the original segment), > 163: * or provided explicitly. In other words, a sliced segment has <em>stricter</em> spatial bounds than those of the original segment: Strictly, a sliced segment can have the *same* spatial bounds as the original segment. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 15 15:38:52 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 15:38:52 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <rolvXeYI5PDmoX-Q5l7obDmasgTF9Dzk-dOMmMO80ug=.3423895e-bf7a-427e-becb-b76479f6c72f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> <rolvXeYI5PDmoX-Q5l7obDmasgTF9Dzk-dOMmMO80ug=.3423895e-bf7a-427e-becb-b76479f6c72f@github.com> Message-ID: <v8I8-DqGKUAuLfw90GJxVUw8do6sZO_xUb5sALQ_bII=.6992ccd9-0da3-46d8-b819-bc7bfdc22da6@github.com> On Tue, 15 Nov 2022 15:31:58 GMT, Per Minborg <pminborg at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add `since` tag in Module/ModuleLayer preview methods > > src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 163: > >> 161: * segment is derived from the address of the original segment, by adding an offset (expressed in bytes). The size of >> 162: * the sliced segment is either derived implicitly (by subtracting the specified offset from the size of the original segment), >> 163: * or provided explicitly. In other words, a sliced segment has <em>stricter</em> spatial bounds than those of the original segment: > > Strictly, a sliced segment can have the *same* spatial bounds as the original segment. True - but I think the current text is a good compromise e.g. it is narrative text. You can always dive into `MemorySegment::asSlice` and find out more. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From pminborg at openjdk.org Tue Nov 15 15:28:52 2022 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 15 Nov 2022 15:28:52 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> Message-ID: <D5wT2ztN28Ffda06oR6nz-pB_1Tz7Wli1JP-PDuRlp0=.aa077176-1e7a-44f7-9aa1-7c20eb40a2da@github.com> On Tue, 15 Nov 2022 12:34:43 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add `since` tag in Module/ModuleLayer preview methods src/java.base/share/classes/java/lang/foreign/Arena.java line 136: > 134: > 135: /** > 136: * {@return {@code true} if the provided thread can close this arena} I think this is equivalent and simpler: {@return if the provided thread can close this arena}. But I know there are many examples of {@code true} in the JDK. src/java.base/share/classes/java/lang/foreign/GroupLayout.java line 46: > 44: > 45: /** > 46: * Returns the member layouts associated with this group. We may use {@return the member layouts associated with this group}. src/java.base/share/classes/java/lang/foreign/Linker.java line 264: > 262: > 263: /** > 264: * Returns a symbol lookup for symbols in a set of commonly used libraries. Use {@return ...} ------------- PR: https://git.openjdk.org/jdk/pull/10872 From rkennke at openjdk.org Tue Nov 15 15:51:08 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 15 Nov 2022 15:51:08 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_C2oCFsbq1QdFO_HjwfXHNt0XrtV06TqRK1a8lpiXsI=.4650c115-d734-4655-bc6a-ec46314ab5ed@github.com> References: <mgQHdsI_oHeWVQEOQNQLrfplcvatEauNPfq1rEswJF4=.cc842c02-e7e0-4f72-95a0-1033ce101cfe@github.com> <TlQR1R0Jt3DqqMWNwZzUyRpSs7lqI0Cig2zpUnmYI3s=.e3add258-443b-4adb-9b31-9f9a76042ff4@github.com> <NDsoMk5BjB0oLGW6pQagOrm-CWrPjc-wwfRhA3vJt6g=.10119bb3-2d8b-4d53-bfaa-9f7a01dabfd7@github.com> <TCLX4fpqVeou-wQDj1SBil2xIyIzk2NFcj7pPpz_xjs=.4152872e-18aa-426b-b967-68118f7ba62d@github.com> <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> <_C2oCFsbq1QdFO_HjwfXHNt0XrtV06TqRK1a8lpiXsI=.4650c115-d734-4655-bc6a-ec46314ab5ed@github.com> Message-ID: <q0GwEDtslrRgEVtm3TUjNLnXYigllX6_QQQCK7pbnlk=.9e132f11-60c2-45a6-b197-e454037ad901@github.com> On Mon, 14 Nov 2022 22:59:22 GMT, John R Rose <jrose at openjdk.org> wrote: > > So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. > > @rose00 why only adjacently? Nested locking can be interleaved on different monitors. > > Yes it can; you can have nesting A, B, A. But the thread-based fast-locking list might not cover that case. If it were restricted to only adjacent records in the way I sketched, it would need to use a different, slower technique for the A, B, A case. The trade-off is that if you only allow adjacent recursive locks on the list, you don't need to search the list beyond the first element, to detect re-locking. Dunno if that pencils out to a real advantage, though, since the fallback is slow. TBH, I don't currently think that making fast-locking recursive is very important. In-fact, the need for the fast-locking appears somewhat questionable to begin with - the scenario where it performs better than OM-locking is rather narrow and really only relevant for legacy code. Stack-locking and fast-locking only help workloads that 1. Do lots of uncontended, e.g. single-threaded locking and 2. Churn lots of monitor objects. It is not enough to use a single Vector a lot - the cost of allocating the OM would soon be amortized by lots of OM action. In order for stack-/fast-locking to be useful, you have to have a workload that keeps allocating new lock objects and use them only once or very few times. For example, I have seen this in OpenJDK's XML code, where the XSLT compiler would generate code that uses an ungodly amount StringBuffers (this probably warrants a separate fix). Now, where would recursive locking support for the fast-locking path be useful? I have yet to see a workloa d that suffers because of a lack of recursive locking support. Implementing recursive fast-locking means we'd have to add code in the fast-path, and that would affect non-recursive locking as well. I'd rather keep the implementation simple and fast. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From aph at openjdk.org Tue Nov 15 15:52:22 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 15:52:22 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v3] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <upL_LOJRbB9XCJtuvA7rHX3CzZe34ZgbK50GeS_dFhs=.d5490421-c718-4e53-9b7d-cbb24f3a7b0f@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/4bd44d66..442a04ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From mcimadamore at openjdk.org Tue Nov 15 15:48:28 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 15:48:28 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v23] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <ICzcAF5BFn7G7mmNhTYJZNn2RWaqoUIjzP_uhEC_CtU=.28040345-a748-464a-ba9d-1e4e31ce368f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/b2dd8926..19e0f6d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=21-22 Stats: 11 lines in 1 file changed: 5 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From eastigeevich at openjdk.org Tue Nov 15 16:34:58 2022 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 15 Nov 2022 16:34:58 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <XP0qyXp11-t8K0GNu0s9QSUzTX_YLZGElnfH3V-0KTI=.3fb35ca5-cb29-4dfa-b3b9-110d756de4ec@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <XP0qyXp11-t8K0GNu0s9QSUzTX_YLZGElnfH3V-0KTI=.3fb35ca5-cb29-4dfa-b3b9-110d756de4ec@github.com> Message-ID: <2O5n6UzCnpjCM-BKPnVxlwrnTmtF73m2A-5K6xdANfI=.25c48d96-8a9f-4f7f-9153-0367995b5dff@github.com> On Mon, 14 Nov 2022 15:47:25 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. >> >> This change replaces >> LEA: r1 = r1 + rsi * 1 + t >> with >> ADDs: r1 += t; r1 += rsi. >> >> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. >> >> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. >> >> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. > > Could you please post JMH microbenchmarks with and without this change? You can run them with `org.openjdk.bench.java.security.MessageDigests` [1] > > [1] https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/security/MessageDigests.java @luhenry, @vnkozlov Sorry for the uninformative PR description. In the MD5 intrinsic stub we use 3 operand LEA. This LEA is on the critical path. The optimization is done according to the Intel 64 and IA-32 Architectures Optimization Reference Manual (Feb 2022), 3.5.1.2: In Sandy Bridge microarchitecture, there are two significant changes to the performance characteristics of LEA instruction: For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: ? LEA that has all three source operands: base, index, and offset. ? LEA that uses base and index registers where the base is EBP, RBP, or R13. ? LEA that uses RIP relative addressing mode. ? LEA that uses 16-bit addressing mode. Assembly/Compiler Coding Rule 30. (ML impact, L generality) If an LEA instruction using the scaled index is on the critical path, a sequence with ADDs may be better. ADD has had latency 1 and throughput 4 since Haswell (see https://www.agner.org/optimize/instruction_tables.pdf). >From https://www.agner.org/optimize/instruction_tables.pdf, in Ice Lake LEA performance was improved to latency 1 and throughput 2. This explains no improvement on it. The patch correctness was tested with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. The microbenchmark we used: import org.apache.commons.lang3.RandomStringUtils; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.BenchmarkParams; import java.nio.charset.StandardCharsets; import java.security.MessageDigest; import java.util.Arrays; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.TimeUnit; import java.util.stream.IntStream; @BenchmarkMode(Mode.Throughput) @OutputTimeUnit(TimeUnit.MILLISECONDS) @State(Scope.Benchmark) public class MD5Benchmark { private static final int MAX_INPUTS_COUNT = 1000; private static final int MAX_INPUT_LENGTH = 128 * 1024; private static List<byte[]> inputs; static { inputs = new ArrayList<>(); IntStream.rangeClosed(1, MAX_INPUTS_COUNT).forEach(value -> inputs.add(RandomStringUtils.randomAlphabetic(MAX_INPUT_LENGTH).getBytes(StandardCharsets.UTF_8))); } @Param({"64", "128", "256", "512", "1024", "2048", "4096", "8192", "16384", "32768", "65536", "131072"}) private int data_len; @State(Scope.Thread) public static class InputData { byte[] data; int count; byte[] expectedDigest; byte[] digest; @Setup public void setup(BenchmarkParams params) { data = inputs.get(ThreadLocalRandom.current().nextInt(0, MAX_INPUTS_COUNT)); count = Integer.parseInt(params.getParam("data_len")); expectedDigest = calculateJdkMD5Checksum(data, count); } @TearDown public void check() { if (!Arrays.equals(expectedDigest, digest)) { throw new RuntimeException("Expected md5 digest:\n" + Arrays.toString(expectedDigest) + "\nGot:\n" + Arrays.toString(digest)); } } } @Benchmark public void testMD5(InputData in) { in.digest = calculateMD5Checksum(in.data, in.count); } private static byte[] calculateMD5Checksum(byte[] input, int count) { try { MessageDigest md5 = MessageDigest.getInstance("MD5"); md5.update(input, 0, count); return md5.digest(); } catch (Exception e) { throw new RuntimeException(e); } } } ------------- PR: https://git.openjdk.org/jdk/pull/11054 From aph at openjdk.org Tue Nov 15 16:37:15 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 16:37:15 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v4] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <fJ7c8YV5FyFoWr_aN3FvOPtHKVQaP357dHygu1t38jY=.6964383d-bc17-4706-9abe-85a60b6817ae@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/442a04ef..c70945bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=02-03 Stats: 10 lines in 5 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From mcimadamore at openjdk.org Tue Nov 15 15:28:50 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 15:28:50 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <7T_4KxIMV_s9h3OjbniZId7ridKlYeJ1sXu3tgBor2c=.9e0d8f95-554b-4dc3-8b1b-27e78f85578d@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> <7T_4KxIMV_s9h3OjbniZId7ridKlYeJ1sXu3tgBor2c=.9e0d8f95-554b-4dc3-8b1b-27e78f85578d@github.com> Message-ID: <wE3yFS78OA3hPMr2hdx-Cn1prEWIMbwkWKgPJyXhMcM=.941de4e5-e85a-4739-a1f8-ba58a552481d@github.com> On Tue, 15 Nov 2022 15:09:02 GMT, Per Minborg <pminborg at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add `since` tag in Module/ModuleLayer preview methods > > src/java.base/share/classes/java/lang/foreign/Arena.java line 89: > >> 87: >> 88: /** >> 89: * Returns a native memory segment with the given size (in bytes) and alignment constraint (in bytes). > > It is noted that the current documentation does not require a **new** native memory segment to be returned. Would it not be better with: > > Creates a new native memory segment ... > > The new shared segment might share actual backing memory though. My feeling is that being overly precise over identity might backfire. It is not important whether the segment is a new instance or not. But there is, perhaps, another invariant that is more semantically relevant: e.g. the returned segments (whether new or not, we don't care) should be backed by "disjoint" regions of memory. That is, if the method returns a segment with address `0` and size `100`, calling the method again cannot return a segment whose address is `50` and size is `100`. In principle, the segment allocator interface allows for this (see `SegmentAllocator::prefixAllocator`) - but for an arena, a behavior such as this would be indesirable, IMHO. > src/java.base/share/classes/java/lang/foreign/Arena.java line 119: > >> 117: >> 118: /** >> 119: * {@return the arena scope} > > Add a period ('.') after the closing curly bracket. This is a general comment. I don't think we did this consistently in other places, I'd prefer to leave as is. > src/java.base/share/classes/java/lang/foreign/Arena.java line 136: > >> 134: >> 135: /** >> 136: * {@return {@code true} if the provided thread can close this arena} > > I think this is equivalent and simpler: > > {@return if the provided thread can close this arena}. > > But I know there are many examples of {@code true} in the JDK. I'll leave as is - we can deal with this cosmetic javadoc issues at a later point. > src/java.base/share/classes/java/lang/foreign/GroupLayout.java line 46: > >> 44: >> 45: /** >> 46: * Returns the member layouts associated with this group. > > We may use {@return the member layouts associated with this group}. Same - I'll leave these tweaks for later. > src/java.base/share/classes/java/lang/foreign/Linker.java line 264: > >> 262: >> 263: /** >> 264: * Returns a symbol lookup for symbols in a set of commonly used libraries. > > Use {@return ...} Same - I'll leave these tweaks for later. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Tue Nov 15 15:58:46 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 15:58:46 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v24] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <2K-hydg-uLovxuhq4-WgeYlZPtj-INuCGlEKieRg77E=.de717cd6-8104-4402-b935-7ccb90199e4f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix tests broken by MemorySession rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/19e0f6d5..54fb4856 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=22-23 Stats: 290 lines in 37 files changed: 0 ins; 2 del; 288 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From lucy at openjdk.org Tue Nov 15 16:33:00 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 15 Nov 2022 16:33:00 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> Message-ID: <5CbmQLn7JMf11Q_RCVSBYvUiY-TPXV2cFihTq7BbQL0=.b3bb362e-8402-4ec7-91f0-279afaf197f3@github.com> On Mon, 14 Nov 2022 19:44:17 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > delete swp file With the comments from others honoured, changes look good to me. I found just one, now obsolete, assert which you may want to delete. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From mcimadamore at openjdk.org Tue Nov 15 15:38:49 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 15:38:49 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v22] In-Reply-To: <wE3yFS78OA3hPMr2hdx-Cn1prEWIMbwkWKgPJyXhMcM=.941de4e5-e85a-4739-a1f8-ba58a552481d@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <IChQ_tWAjGyOjB9_sxwMmW79pGr_v-2s8F9-i_suhXw=.465a1051-fdb9-4715-a7e3-0608a78a3ba1@github.com> <7T_4KxIMV_s9h3OjbniZId7ridKlYeJ1sXu3tgBor2c=.9e0d8f95-554b-4dc3-8b1b-27e78f85578d@github.com> <wE3yFS78OA3hPMr2hdx-Cn1prEWIMbwkWKgPJyXhMcM=.941de4e5-e85a-4739-a1f8-ba58a552481d@github.com> Message-ID: <LkU1oaYtTflIao6PXDvdw52m4DQczQJuEIcw61PdtdA=.6134e6c4-4883-468a-9414-7b305428e03a@github.com> On Tue, 15 Nov 2022 15:22:07 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/foreign/Arena.java line 89: >> >>> 87: >>> 88: /** >>> 89: * Returns a native memory segment with the given size (in bytes) and alignment constraint (in bytes). >> >> It is noted that the current documentation does not require a **new** native memory segment to be returned. Would it not be better with: >> >> Creates a new native memory segment ... >> >> The new shared segment might share actual backing memory though. > > My feeling is that being overly precise over identity might backfire. It is not important whether the segment is a new instance or not. But there is, perhaps, another invariant that is more semantically relevant: e.g. the returned segments (whether new or not, we don't care) should be backed by "disjoint" regions of memory. That is, if the method returns a segment with address `0` and size `100`, calling the method again cannot return a segment whose address is `50` and size is `100`. In principle, the segment allocator interface allows for this (see `SegmentAllocator::prefixAllocator`) - but for an arena, a behavior such as this would be indesirable, IMHO. I will add: Furthermore, for any two segments S1, S2 returned by this method, the following invariant must hold: S1.overlappingSlice(S2).isEmpty() == true ``` ------------- PR: https://git.openjdk.org/jdk/pull/10872 From lucy at openjdk.org Tue Nov 15 16:33:02 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 15 Nov 2022 16:33:02 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> Message-ID: <74-iTHarZs4dtXp8dqJEKYrXRw24WAQHrUorBJ4Tmvc=.e2bb3e95-c538-42cb-94a6-6d3378d5bdab@github.com> On Mon, 14 Nov 2022 05:32:20 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > include missing os head file src/hotspot/share/adlc/output_c.cpp line 536: > 534: int printed = snprintf(args, 37, "0x%x, 0x%x, %u", > 535: resources_used, resources_used_exclusively, element_count); > 536: assert(printed <= 36, "overflow"); if snprintf works correctly (we rely on that), this assert will never fire. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From jvernee at openjdk.org Tue Nov 15 17:17:40 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 15 Nov 2022 17:17:40 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> Message-ID: <xIiWIQI6ZJoW_U7605O83Xim22Kof5IgvTLTIIPUjeo=.012324e7-fe2d-4c27-bc53-2b0ec21d0fb6@github.com> On Tue, 8 Nov 2022 20:05:09 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: >> >> - Tweak copyright headers >> - Use @requires to disable some tests on x86 >> - Use AssertionError for internal exceptions > > src/hotspot/cpu/aarch64/downcallLinker_aarch64.cpp line 146: > >> 144: Register tmp2 = r10; >> 145: >> 146: VMStorage shuffle_reg = VMS_R19; > > I'd prefer to see `as_VMStorage(Register)` used instead and all `VMS_...` constants go away. Yes, seems to be possible now that `as_VMStorage` can be made `constexpr`. Will do > src/hotspot/cpu/aarch64/vmstorage_aarch64.inline.hpp line 68: > >> 66: } >> 67: >> 68: inline VMStorage as_VMStorage(Register reg) { > > Mark as `constexpr` maybe? Tried this before when `Register` wasn't as `constexpr` friendly due to the reinterpret casts. Seems to work now though (thanks! :)) I'll change all these to `constexpr`. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From kvn at openjdk.org Tue Nov 15 17:24:20 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Nov 2022 17:24:20 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Thank you all for providing performance data. Looks good. I will run testing. Do we have other intrinsics which use LEA (not for this fix)? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11054 From kvn at openjdk.org Tue Nov 15 17:24:20 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Nov 2022 17:24:20 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <atQETPC7Hy8tQCfM6Yg1pYYM0fSHBki1-HdvR9JV7bA=.e61109b4-4007-4468-9be6-310ea527c328@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <atQETPC7Hy8tQCfM6Yg1pYYM0fSHBki1-HdvR9JV7bA=.e61109b4-4007-4468-9be6-310ea527c328@github.com> Message-ID: <f7QVbMp02_KU1X_aGx_sHLP9AEBoDC9iz1ag837MuB4=.e121d9c8-de83-46ba-8900-da2d3b5f33d9@github.com> On Tue, 15 Nov 2022 13:51:24 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: >> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. >> >> This change replaces >> LEA: r1 = r1 + rsi * 1 + t >> with >> ADDs: r1 += t; r1 += rsi. >> >> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. >> >> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. >> >> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. > > Performance without the optimization on Ice Lake: > > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 15 5402.018 ? 17.033 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.722 ? 0.003 ops/ms > MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4652.620 ? 35.432 ops/ms > MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.573 ? 0.016 ops/ms > > > Performance with optimization on Ice Lake: > > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 15 5348.594 ? 14.303 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.671 ? 0.008 ops/ms > MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4583.530 ? 12.752 ops/ms > MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.545 ? 0.006 ops/ms @yftsai can you merge latest JDK sources? Some of GHA testing failures should be fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From aph at openjdk.org Tue Nov 15 17:18:19 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 17:18:19 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v5] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <N2dCDLPJeA2u7BzUSEmSGED-0qx7wBr7VLiIgJyBeng=.7d01959f-70c4-4a77-b3be-25609ee8c797@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix failing serviceability tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/c70945bf..4e650314 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=03-04 Stats: 17 lines in 7 files changed: 14 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 15 17:36:16 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Nov 2022 17:36:16 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix failing serviceability tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/4e650314..e1063d7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=04-05 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From jnimeh at openjdk.org Tue Nov 15 17:37:58 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 15 Nov 2022 17:37:58 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> Message-ID: <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> On Tue, 15 Nov 2022 17:18:37 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > Do we have other intrinsics which use LEA (not for this fix)? My pending ChaCha20 intrinsics ( #7702 ) use LEA for getting the address of constant data to be loaded into SIMD registers. That happens before the 10-iteration loop that implements the 20 rounds (which is the critical section of the intrinsic). ------------- PR: https://git.openjdk.org/jdk/pull/11054 From thartmann at openjdk.org Tue Nov 15 17:39:59 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Nov 2022 17:39:59 GMT Subject: RFR: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value In-Reply-To: <w6GptGr9YBvbeBfcmfX74o9UdianeYc3YN9q0bH0uBc=.8950f5db-6bfe-4798-b214-53c0299e7f0a@github.com> References: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> <BdI8UaFabbx4sJF1SBGfBp3Jtf271JhHAKO35O8p4zo=.c94fb067-287b-4174-847e-beb007194ce4@github.com> <w6GptGr9YBvbeBfcmfX74o9UdianeYc3YN9q0bH0uBc=.8950f5db-6bfe-4798-b214-53c0299e7f0a@github.com> Message-ID: <4Ki5_vPij80Zut9OnZ92A7I0wZWzePSVBJxIt6HQqPo=.4faec3dd-bc5a-4717-a0c1-9f53f55bf814@github.com> On Tue, 15 Nov 2022 10:53:12 GMT, Doug Simon <dnsimon at openjdk.org> wrote: >> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 270: >> >>> 268: opcode == Bytecodes.INVOKEVIRTUAL || >>> 269: opcode == Bytecodes.INVOKESPECIAL || >>> 270: opcode == Bytecodes.INVOKESTATIC) { >> >> Indentation looks a bit off. > > That's due to the Eclipse formatter enforced style adopted across JVMCI and Graal. I can add comments to explicitly disable it here if you want but I'm not sure it's worth it. I see. No, that's not worth it then. ------------- PR: https://git.openjdk.org/jdk/pull/11142 From jvernee at openjdk.org Tue Nov 15 17:42:03 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 15 Nov 2022 17:42:03 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> Message-ID: <UuSmts4-l9k1_nSeClPLK29MZVfIcMLYVAJ4zPo5LFo=.91ee4e62-b9d4-4b95-931c-f5b1745ccd48@github.com> On Tue, 15 Nov 2022 00:33:35 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: > * `VMStorage` looks very similar to `VMReg`. What's the purpose of the new representation? `VMReg` encodes stack offsets in slots (of 32 bits), which is not enough to represent every call shape on macosx-aarch64. Byte offsets are needed. We also need some size information for that, e.g. to avoid a store of 1 byte to the stack from overwriting other things with a 64-bit write. `VMStorage` also has a channel for that: either a size in bytes, or a register mask (mask of used segments of a register) that can be used to indicate the size if needed. (`VMReg` sort of did that with `BasicType`, but we now erase every sub-int type to `int`, so that no longer works, and it always felt a bit wrong since we're really moving bits between registers, not typed Java values). See also: https://github.com/openjdk/panama-foreign/pull/699 > * why do you structure the header files the way you do? `vmstorage.inline.hpp`, `vmstorage_<cpu>.inline.hpp`, `vmstorageBase.inline.hpp` instead of just `vmstorage.hpp`/`vmstorage_<cpu>.hpp` The CPU header depends on the definition of `VMStorage` to be complete, so I had that include the `vmstorageBase` header, and had the `vmstorage` header include the cpu header. But, looking now, it looks like I can also just include the CPU header at the end of the `vmstorageBase` file, and remove 1 header. I'll do that to make things simpler. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From duke at openjdk.org Tue Nov 15 17:44:12 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 17:44:12 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> Message-ID: <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> On Tue, 15 Nov 2022 00:06:40 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - make UsePolyIntrinsics option diagnostic >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: > >> 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128) >> 383: { >> 384: const Register t1 = r13; > > Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called. Thanks for pointing this out.. I spent quite a bit of time and went back and forth on 'register allocation'... it does make sense to pass all the temps needed, when the number of temps is small. This is the case for the three `*_limbs_*` functions. Maybe I should indeed do that... On other hand, there are functions like `poly1305_multiply8_avx512` and `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think it makes sense to keep those as 'function-header declarations'. Then there are functions like `poly1305_multiply_scalar` that could go either way, has some temps and 'implicitly clobbered' registers, but probably should stay 'as is'.. I ended up being 'pedantic' and making _all_ temps into 'header variables'. I also tried to comment, but those probably mean more to me then anyone else in hindsight? // Register Map: // GPRs: // input = rdi // length = rbx // accumulator = rcx // R = r8 // a0 = rsi // a1 = r9 // a2 = r10 // r0 = r11 // r1 = r12 // c1 = r8; // t1 = r13 // t2 = r14 // t3 = r15 // t0 = r14 // rscratch = r13 // stack(rsp, rbp) // imul(rax, rdx) // ZMMs: // T: xmm0-6 // C: xmm7-9 // A: xmm13-18 // B: xmm19-24 // R: xmm25-29 ... // Register Map: // reserved: rsp, rbp, rcx // PARAMs: rdi, rbx, rsi, r8-r12 // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx const Register t0 = r14; const Register t1 = r13; const Register rscratch = r13; // poly1305_limbs_avx512 clobbers: xmm0, xmm1 // poly1305_multiply8_avx512 clobbers: xmm0-xmm6 const XMMRegister T0 = xmm2; ... I think I am ok changing the `*limbs*` functions (even started, before I remembered my train of thought from months back..) but let me know if you agree with the rest of the reasoning? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From eastigeevich at openjdk.org Tue Nov 15 17:50:59 2022 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 15 Nov 2022 17:50:59 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> Message-ID: <G81t5Qpmn8ssbIISUTdnKeHG5w2AUWjtW8kd0VHBx8Q=.9cc2ddef-0ba6-4bf3-a13f-61a6c266552e@github.com> On Tue, 15 Nov 2022 17:33:50 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > Do we have other intrinsics which use LEA (not for this fix)? I have plans to look at other uses of LEA in Hotspot. I have not started yet due to other urgent work. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From mcimadamore at openjdk.org Tue Nov 15 17:54:16 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 17:54:16 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v25] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <-RoscJ-7QuJ7y50zTBcxRISETzsAnuWdhDjOKhkcLoU=.99cc6f49-c850-4d93-a40b-4cd953a99cb2@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix MapToMemorySegmentTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/54fb4856..b331a4fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=23-24 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From eastigeevich at openjdk.org Tue Nov 15 18:00:04 2022 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 15 Nov 2022 18:00:04 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> Message-ID: <StEWkgMGzlkdQmzsr4KKuqQSuTLVs37-zK573UyNJq0=.f24dacd9-cf4a-46e8-8e34-6a4a0fc1db24@github.com> On Tue, 15 Nov 2022 17:33:50 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > > Do we have other intrinsics which use LEA (not for this fix)? > > My pending ChaCha20 intrinsics ( #7702 ) use LEA for getting the address of constant data to be loaded into SIMD registers. That happens before the 10-iteration loop that implements the 20 rounds (which is the critical section of the intrinsic). >From #7702, I see they are not 3 operand LEA. No need to change them. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From iwalulya at openjdk.org Tue Nov 15 18:05:14 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 15 Nov 2022 18:05:14 GMT Subject: RFR: 8296954: G1: Enable parallel scanning for heap region remset Message-ID: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> Hi all, Please review this change that allows parallel scanning of a heap region's remembered set. More balanced work load distribution in cases where are cards are unevenly distributed among remembered sets. Testing: Tier 1-3 Thanks ------------- Commit messages: - CHT par iterate - fix BucketsOperation set Changes: https://git.openjdk.org/jdk/pull/11173/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11173&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296954 Stats: 34 lines in 8 files changed: 31 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11173.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11173/head:pull/11173 PR: https://git.openjdk.org/jdk/pull/11173 From mcimadamore at openjdk.org Tue Nov 15 18:03:39 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 18:03:39 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v26] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <-rih5SODHs0oMsQlaTc_lny0Cz6YvYLa4Arjr3Sf0fA=.755847f0-6a14-4784-85ba-97be21e6656b@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix @since tag in SegmentScope ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/b331a4fd..5f60d052 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Tue Nov 15 18:12:41 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 15 Nov 2022 18:12:41 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <xIiWIQI6ZJoW_U7605O83Xim22Kof5IgvTLTIIPUjeo=.012324e7-fe2d-4c27-bc53-2b0ec21d0fb6@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> <xIiWIQI6ZJoW_U7605O83Xim22Kof5IgvTLTIIPUjeo=.012324e7-fe2d-4c27-bc53-2b0ec21d0fb6@github.com> Message-ID: <bWiQQqT_4595l4EP_ZP_5TQGA4ocK5IiUskxIhwyJWY=.70dd2085-c995-4cdb-9cab-ce78473c9106@github.com> On Tue, 15 Nov 2022 17:13:58 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> src/hotspot/cpu/aarch64/vmstorage_aarch64.inline.hpp line 68: >> >>> 66: } >>> 67: >>> 68: inline VMStorage as_VMStorage(Register reg) { >> >> Mark as `constexpr` maybe? > > Tried this before when `Register` wasn't as `constexpr` friendly due to the reinterpret casts. Seems to work now though (thanks! :)) I'll change all these to `constexpr`. Err, looks like this works for MSVC, but with GCC I hit a snag eventually (it also require putting `constexpr` on a lot of the `Register`/`VMReg` API). This error occurs: * For target hotspot_variant-server_libjvm_objs_downcallLinker.o: In file included from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/vmstorage.inline.hpp:100, from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/foreignGlobals.hpp:29, from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.hpp:27, from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.cpp:25: ------------- PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Tue Nov 15 18:12:42 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 15 Nov 2022 18:12:42 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <bWiQQqT_4595l4EP_ZP_5TQGA4ocK5IiUskxIhwyJWY=.70dd2085-c995-4cdb-9cab-ce78473c9106@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> <xIiWIQI6ZJoW_U7605O83Xim22Kof5IgvTLTIIPUjeo=.012324e7-fe2d-4c27-bc53-2b0ec21d0fb6@github.com> <bWiQQqT_4595l4EP_ZP_5TQGA4ocK5IiUskxIhwyJWY=.70dd2085-c995-4cdb-9cab-ce78473c9106@github.com> Message-ID: <hOVavZoBAgHC7BL4uGj7GG-6qbgARitOiaeCIHNcrRg=.fc83cf27-b863-4129-ab3c-e5b9551ca08a@github.com> On Tue, 15 Nov 2022 18:07:18 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Tried this before when `Register` wasn't as `constexpr` friendly due to the reinterpret casts. Seems to work now though (thanks! :)) I'll change all these to `constexpr`. > > Err, looks like this works for MSVC, but with GCC I hit a snag eventually (it also require putting `constexpr` on a lot of the `Register`/`VMReg` API). > > This error occurs: > > > * For target hotspot_variant-server_libjvm_objs_downcallLinker.o: > In file included from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/vmstorage.inline.hpp:100, > from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/foreignGlobals.hpp:29, > from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.hpp:27, > from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.cpp:25: I guess there's also the possibility that this might trip up compilers on other platforms, even if I can manage to fix it here. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Tue Nov 15 18:19:09 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 15 Nov 2022 18:19:09 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4] In-Reply-To: <hOVavZoBAgHC7BL4uGj7GG-6qbgARitOiaeCIHNcrRg=.fc83cf27-b863-4129-ab3c-e5b9551ca08a@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <FkF6UVTR36LLSMwEk6TQRHD54w34HXls5pI-yguV5ec=.303a0ba0-0f4d-4948-91ef-a2630063ab83@github.com> <Uxzq6nU2R5YKau-r1EK9sp4_EMOyhnyYzs-OKTwa2HE=.0b518ea9-f5e3-4894-9d71-069a474d56fb@github.com> <xIiWIQI6ZJoW_U7605O83Xim22Kof5IgvTLTIIPUjeo=.012324e7-fe2d-4c27-bc53-2b0ec21d0fb6@github.com> <bWiQQqT_4595l4EP_ZP_5TQGA4ocK5IiUskxIhwyJWY=.70dd2085-c995-4cdb-9cab-ce78473c9106@github.com> <hOVavZoBAgHC7BL4uGj7GG-6qbgARitOiaeCIHNcrRg=.fc83cf27-b863-4129-ab3c-e5b9551ca08a@github.com> Message-ID: <iOYeC0Ha_ygibOjUfw6tJNYCnZf7kdGjjrQZrcJT-ss=.07a186a5-3d81-4e79-aeaa-611eae559702@github.com> On Tue, 15 Nov 2022 18:09:21 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Err, looks like this works for MSVC, but with GCC I hit a snag eventually (it also require putting `constexpr` on a lot of the `Register`/`VMReg` API). >> >> This error occurs: >> >> >> * For target hotspot_variant-server_libjvm_objs_downcallLinker.o: >> In file included from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/vmstorage.inline.hpp:100, >> from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/foreignGlobals.hpp:29, >> from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.hpp:27, >> from /mnt/h/openjdk/foreign-abi/src/hotspot/share/prims/downcallLinker.cpp:25: > > I guess there's also the possibility that this might trip up compilers on other platforms, even if I can manage to fix it here. Looks like it was the next line after the one in the diagnostic that was actually the problem. I seem to have something working now. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From alanb at openjdk.org Tue Nov 15 18:19:11 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 15 Nov 2022 18:19:11 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> Message-ID: <C-v7qjsby2H_3Unxk5z109bqWYTG5swuqp5MJEhgptc=.78a1caa9-0c6d-4f0e-bd8c-3757b16cb453@github.com> On Tue, 15 Nov 2022 17:36:16 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix failing serviceability tests test/jdk/ProblemList.txt line 804: > 802: > 803: # Loom, fibers branch > 804: This is left over from when Stress.java was excluded so we can remove it from the PR. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From kvn at openjdk.org Tue Nov 15 18:32:01 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Nov 2022 18:32:01 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> References: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> Message-ID: <FPff6e4lq6xVejleKLbVZF-G4Jt6O7CGaDGIFXAM0v0=.797b056d-5b3c-4fde-a86b-06e583c37f96@github.com> On Wed, 26 Oct 2022 16:43:26 GMT, zzambers <duke at openjdk.org> wrote: > Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10875 From mcimadamore at openjdk.org Tue Nov 15 18:47:39 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 18:47:39 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix typo in SegmentScope javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/5f60d052..876587c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=25-26 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From sviswanathan at openjdk.org Tue Nov 15 18:52:08 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Nov 2022 18:52:08 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <StEWkgMGzlkdQmzsr4KKuqQSuTLVs37-zK573UyNJq0=.f24dacd9-cf4a-46e8-8e34-6a4a0fc1db24@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> <StEWkgMGzlkdQmzsr4KKuqQSuTLVs37-zK573UyNJq0=.f24dacd9-cf4a-46e8-8e34-6a4a0fc1db24@github.com> Message-ID: <jcVqn-S4_uZMIc3D1N_bJQYxIfC5Yr1sv_4fkAjMFBc=.3b04268d-0dfe-4779-b793-901066a5c22e@github.com> On Tue, 15 Nov 2022 17:57:35 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote: > Do we have other intrinsics which use LEA (not for this fix)? There is a VM_Version::supports_fast_2op_lea() and VM_Version::supports_fast_3op_lea() check available which is used to do lea optimizations. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From coleenp at openjdk.org Tue Nov 15 18:52:37 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 18:52:37 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11033/files - new: https://git.openjdk.org/jdk/pull/11033/files/5bfec2e9..b96186ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11033&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/11033.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11033/head:pull/11033 PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 15 18:52:39 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 18:52:39 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <y02Dl87_g-dbnGjXtN0DOD3fgZy7PTPPLRVENr6man4=.ceaaf2a3-6b7c-473a-8a99-6117ed1b6ae9@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> <y02Dl87_g-dbnGjXtN0DOD3fgZy7PTPPLRVENr6man4=.ceaaf2a3-6b7c-473a-8a99-6117ed1b6ae9@github.com> Message-ID: <k04UDqZ_KT5D9M1DuuLVX7wI3EuE9uAnPIltj1IuIEo=.ad482f38-ed5d-4bff-bebf-d1cace670212@github.com> On Tue, 8 Nov 2022 23:35:47 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Forgot a null check. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 540: > >> 538: >> 539: jthread * >> 540: JvmtiEnvBase::new_jthreadArray(int length, Handle *handles) { > > Shouldn't this method need to cast the return value to `jthread*`? And potentially shouldn't all the jobject's now be jthread's? jthread and jthreadGroup are typedefs to jobject in JVMTI spec. https://docs.oracle.com/en/java/javase/11/docs/specs/jvmti.html#jthread But I updated the code to have the more specific types. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 15 18:52:40 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 18:52:40 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <ivQj42oRzyo1GO_8_6nBpuld9wMZP0lN4oHpnURLO3Y=.efd44aa2-4d2d-4964-9541-b4998376dbef@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> <8IJbhIid1KAN_8xogA0fSvh0RIbYhzPBB5sMBVgQkBw=.ed314adf-1e6e-4ba8-a263-ede8118d4107@github.com> <A3Qkq7t-qMy-88e_UoN0ka3WS5W2Eu0T9gVJ_-KtcY4=.248f7d7b-7a90-4ef1-8f9a-2f07296e5afc@github.com> <ivQj42oRzyo1GO_8_6nBpuld9wMZP0lN4oHpnURLO3Y=.efd44aa2-4d2d-4964-9541-b4998376dbef@github.com> Message-ID: <mOUnMVVsvaR9AQAgKSY2zsgI_GSKJ82G0wrjixq4qbI=.72b8d342-4199-460d-ae32-ac1d92b51ca4@github.com> On Wed, 9 Nov 2022 11:54:55 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> I don't think this has changed. Right now, if there are no child subgroups then *group_count_ptr will be 0 and *groups_ptr will be NULL as there is no memory to deallocate. JVMTI Deallocate is specified to do nothing when called with NULL. > > Alan, you are right. This check existed before. Thanks Alan, yes, I didn't change the null return. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Tue Nov 15 18:52:41 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 18:52:41 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v5] In-Reply-To: <WzN2VJ3a7ykSEbd4wO7UpXsrpFpen5yCPj591phW1q8=.efef7e44-daab-4ace-8850-e68b8d87c531@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <Hxb6ErC2nl80N8zPTKSChG2566N7MqwogJkbYOGr38A=.7425a345-d92f-4089-8c14-10325bd15986@github.com> <WzN2VJ3a7ykSEbd4wO7UpXsrpFpen5yCPj591phW1q8=.efef7e44-daab-4ace-8850-e68b8d87c531@github.com> Message-ID: <g5Ds3gE4PAn04gVsKXC8YhPE7Hp2oHwC1wR7btOi6OI=.4101ed0c-8205-4f55-8a68-8e2968501e80@github.com> On Wed, 9 Nov 2022 09:19:33 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Forgot a null check. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 564: > >> 562: >> 563: for (int i=0; i<length; i++) { >> 564: objArray[i] = JNIHandles::make_local(groups->obj_at(i)); > > Nit: Spaces are missed around '=' and '<' signs. fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From kvn at openjdk.org Tue Nov 15 19:21:02 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Nov 2022 19:21:02 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <jcVqn-S4_uZMIc3D1N_bJQYxIfC5Yr1sv_4fkAjMFBc=.3b04268d-0dfe-4779-b793-901066a5c22e@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> <StEWkgMGzlkdQmzsr4KKuqQSuTLVs37-zK573UyNJq0=.f24dacd9-cf4a-46e8-8e34-6a4a0fc1db24@github.com> <jcVqn-S4_uZMIc3D1N_bJQYxIfC5Yr1sv_4fkAjMFBc=.3b04268d-0dfe-4779-b793-901066a5c22e@github.com> Message-ID: <mEoed1ixMTaOm5oV3JGyPYTu5A-GlV3evOqMKuEztUM=.8b2aff81-13bd-42de-9fb9-6add08b8309b@github.com> On Tue, 15 Nov 2022 18:48:14 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: > > Do we have other intrinsics which use LEA (not for this fix)? > > There is a VM_Version::supports_fast_2op_lea() and VM_Version::supports_fast_3op_lea() check available which is used to do lea optimizations. Thanks you @sviswa7 For this fix, based on IceLake data provided by @yftsai, `supports_fast_3op_lea()` potential help is not enough to justify increase complexity of code. May be in other places it would be more useful but not here IMHO. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From mcimadamore at openjdk.org Tue Nov 15 19:27:08 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 19:27:08 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> Message-ID: <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> On Tue, 15 Nov 2022 17:36:16 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix failing serviceability tests src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 43: > 41: /** > 42: * A value that is set once and is then available for reading for a bounded period of > 43: * execution by a thread. A {@code ScopedValue} allows for safely and efficiently sharing by a thread, or by one or more threads? (when inherited) src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 160: > 158: * record. > 159: * > 160: * <p>For this incubator release, we have provided some system properties Maybe it would be better to frame this as "The reference implementation provides some system properties". The term "reference implementation" is used elsewhere to define JDK specific mechanisms that might, or might not carry across to other JVM/Java SE API implementations. src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 172: > 170: * must be an integer power of 2. > 171: * > 172: * <p>For example, you could use {@code -Djdk.incubator.concurrent.ScopedValue.cacheSize=8}. I would also avoid "you" and "we" in the javadoc. While javadoc is not formal, we often use locutions such as "clients can use/do XYZ". src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 180: > 178: * thread preserves its scoped-value cache when blocked. Like {@code > 179: * ScopedValue.cacheSize}, this is a space versus speed trade-off: if > 180: * you have a great many virtual threads that are blocked most of the "in situations where many virtual threads are blocked most of the time, ..." src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 185: > 183: * would have to be regenerated after a blocking operation. > 184: * > 185: * @param <T> the type of the value Suggestion: * @param <T> the type of the scoped value src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 354: > 352: > 353: /** > 354: * Calls a value returning operation with each scoped value in this mapping bound Suggestion: * Calls a value-returning operation with each scoped value in this mapping bound src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 460: > 458: * } > 459: * > 460: * @param key the ScopedValue key should use `@code` or `@link` src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 463: > 461: * @param value the value, can be {@code null} > 462: * @param <T> the type of the value > 463: * @return a new Carrier with a single mapping same here src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 470: > 468: > 469: /** > 470: * Calls a value returning operation with a {@code ScopedValue} bound to a value Suggestion: * Calls a value returning-operation with a {@code ScopedValue} bound to a value src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 490: > 488: * } > 489: * > 490: * @param key the ScopedValue Again, missing `@code` - please check all methods src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 613: > 611: * @return the value of the scoped value if bound, otherwise {@code other} > 612: */ > 613: public T orElse(T other) { >From an API perspective, wouldn't return `Optional` a more consistent choice? `Optional` has all methods we need to deal with this kind of stuff... src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java line 229: > 227: * <li> Inheritance of {@linkplain ScopedValue scoped values} across threads. > 228: * <li> Confinement checks. The phrase "threads contained in the task scope" in method > 229: * descriptions means threads started in the task scope or descendant scopes. sadly, the term "descendant scopes" is not defined elsewhere in this javadoc src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java line 233: > 231: * > 232: * <p> The following example demonstrates the inheritance of a scoped value. A scoped > 233: * value {@code USERNAME} is bound to the value "duke". A StructuredTaskScope is created Missing `@code` or `@link` for `StructuredTaskScope` src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java line 237: > 235: * The thread inherits the scoped value <em>bindings</em> captured when creating the > 236: * task scope. The code in {@code childTask} uses the value of the scoped value and so > 237: * reads the value "duke". Suggestion: * reads the value {@code "duke"}. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From sviswanathan at openjdk.org Tue Nov 15 19:33:00 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Nov 2022 19:33:00 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <WcrudfBx9Af_qeGVjGq6X_Ks1ADLmxtX6G4NZR1NqdU=.228512d3-68b4-4c91-833b-2e164d3320c9@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Marked as reviewed by sviswanathan (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11054 From sviswanathan at openjdk.org Tue Nov 15 19:33:05 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Nov 2022 19:33:05 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <mEoed1ixMTaOm5oV3JGyPYTu5A-GlV3evOqMKuEztUM=.8b2aff81-13bd-42de-9fb9-6add08b8309b@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <TTQu6swuxVommBo8lFp6Bwsu8ZmLiPQEHOXjdRXSBPM=.4f697830-b114-46e6-a383-44423bae6b98@github.com> <RuibdoRTS-4oOmAB4RgZOAdm6MsFBtNsghvVLjIveZU=.22231fd2-2b30-47d6-81f9-c0b0956a1040@github.com> <StEWkgMGzlkdQmzsr4KKuqQSuTLVs37-zK573UyNJq0=.f24dacd9-cf4a-46e8-8e34-6a4a0fc1db24@github.com> <jcVqn-S4_uZMIc3D1N_bJQYxIfC5Yr1sv_4fkAjMFBc=.3b04268d-0dfe-4779-b793-901066a5c22e@github.com> <mEoed1ixMTaOm5oV3JGyPYTu5A-GlV3evOqMKuEztUM=.8b2aff81-13bd-42de-9fb9-6add08b8309b@github.com> Message-ID: <3hlcUrGv9rVz2BWY2uKlObuyAXSD2BjDViGR3v57z-Q=.f1966c4b-cf39-48ac-b627-6a7001c65acc@github.com> On Tue, 15 Nov 2022 19:19:01 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > > > Do we have other intrinsics which use LEA (not for this fix)? > > > > > > There is a VM_Version::supports_fast_2op_lea() and VM_Version::supports_fast_3op_lea() check available which is used to do lea optimizations. > > Thanks you @sviswa7 > > For this fix, based on IceLake data provided by @yftsai, `supports_fast_3op_lea()` potential help is not enough to justify increase complexity of code. May be in other places it would be more useful but not here IMHO. Yes, I agree. The PR looks good to me. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From duke at openjdk.org Tue Nov 15 19:43:17 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:43:17 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> Message-ID: <5Jz1ZjH_bvH1Imw-Dptwrg9vZFA9lP8PNxnUWjnCru8=.18040fe2-6520-425a-8836-fe382a1e2f34@github.com> On Tue, 15 Nov 2022 00:16:19 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - make UsePolyIntrinsics option diagnostic >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 103: > >> 101: >> 102: ATTRIBUTE_ALIGNED(64) uint64_t POLY1305_MASK44[] = { >> 103: // OFFSET 64: mask_44 > > Redundant comment. done > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 987: > >> 985: >> 986: // Load R into r1:r0 >> 987: poly1305_limbs(R, r0, r1, r1, true); > > What's the intention here when you pass `r1` twice? Just load `R[0]` and `R[2]`. You could use `noreg` to mark an optional operation and check for it in `poly1305_limbs` before loading the corresponding element. ah, I was wondering how to make an 'optional reg' when parameter is not a pointer. `noreg` is exactly what I needed, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 15 19:43:18 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:43:18 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <OfSiSh0ho8oFGm5lOgCGxw84XjHzQHdXvyDOcZBQBXo=.f243574a-c6aa-47b2-8bb1-7951232bf83d@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> <OfSiSh0ho8oFGm5lOgCGxw84XjHzQHdXvyDOcZBQBXo=.f243574a-c6aa-47b2-8bb1-7951232bf83d@github.com> Message-ID: <9p2RTAI9FPWstQu0OtpSmSB7dqhFwmxbw86zZQg4GtU=.1be10660-ee5d-4654-9d4e-4fe3e449fd9b@github.com> On Tue, 15 Nov 2022 00:45:54 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> library_call.cpp takes care of that, it passes the address of 0'th element to the stub. > > Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse rather than help: > > // void processBlocks(byte[] input, int len, int[5] a, int[5] r) > const Register input = rdi; //input+offset > const Register length = rbx; > const Register accumulator = rcx; > const Register R = r8; Added a comment, hopefully less confusing. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 15 19:43:11 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:43:11 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <7hXP-vwxc6J7fklu8QuJqiIcSQRff-QyR1SZ0Fzfqmc=.33a38a51-38c3-451a-a756-ed538507f04e@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Vladimir's review comments - Merge remote-tracking branch 'origin/master' into avx512-poly - Merge remote-tracking branch 'origin/master' into avx512-poly - Vladimir's review - live review with Sandhya - jcheck - Sandhya's review - fix windows and 32b linux builds - add getLimbs to interface and reviews - fix 32-bit build - ... and 15 more: https://git.openjdk.org/jdk/compare/7357a1a3...8f5942d9 ------------- Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=16 Stats: 1859 lines in 32 files changed: 1823 ins; 3 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 15 19:43:18 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:43:18 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> Message-ID: <r_1O7IZ10L42qLo58Al4P7vopxNimqQf_JGXTozhSrQ=.02f7ac95-6ee2-409c-8730-649eae6835fe@github.com> On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const Register t1 = r13; >> >> Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called. > > Thanks for pointing this out.. I spent quite a bit of time and went back and forth on 'register allocation'... it does make sense to pass all the temps needed, when the number of temps is small. This is the case for the three `*_limbs_*` functions. Maybe I should indeed do that... > > On other hand, there are functions like `poly1305_multiply8_avx512` and `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think it makes sense to keep those as 'function-header declarations'. > > Then there are functions like `poly1305_multiply_scalar` that could go either way, has some temps and 'implicitly clobbered' registers, but probably should stay 'as is'.. > > I ended up being 'pedantic' and making _all_ temps into 'header variables'. I also tried to comment, but those probably mean more to me then anyone else in hindsight? > > > // Register Map: > // GPRs: > // input = rdi > // length = rbx > // accumulator = rcx > // R = r8 > // a0 = rsi > // a1 = r9 > // a2 = r10 > // r0 = r11 > // r1 = r12 > // c1 = r8; > // t1 = r13 > // t2 = r14 > // t3 = r15 > // t0 = r14 > // rscratch = r13 > // stack(rsp, rbp) > // imul(rax, rdx) > // ZMMs: > // T: xmm0-6 > // C: xmm7-9 > // A: xmm13-18 > // B: xmm19-24 > // R: xmm25-29 > ... > // Register Map: > // reserved: rsp, rbp, rcx > // PARAMs: rdi, rbx, rsi, r8-r12 > // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx > const Register t0 = r14; > const Register t1 = r13; > const Register rscratch = r13; > > // poly1305_limbs_avx512 clobbers: xmm0, xmm1 > // poly1305_multiply8_avx512 clobbers: xmm0-xmm6 > const XMMRegister T0 = xmm2; > ... > > > I think I am ok changing the `*limbs*` functions (even started, before I remembered my train of thought from months back..) but let me know if you agree with the rest of the reasoning? Changed just the three `*limbs*` functions. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 15 19:46:48 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:46:48 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v18] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <cFcXfBYId3XkRceQX9YEy9sO_uiloIorhRVmCq0448I=.2ae3dc2a-e8d7-4808-abdf-a786369e1dae@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: extra whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/8f5942d9..58488f42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=16-17 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 15 19:46:52 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 19:46:52 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <XAAaQDoWmjbJ0coZZaBwpzxtDMbazRvX6yFPNexV3j4=.93142a64-e120-4f01-8801-d4a0f135c8f7@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <XAAaQDoWmjbJ0coZZaBwpzxtDMbazRvX6yFPNexV3j4=.93142a64-e120-4f01-8801-d4a0f135c8f7@github.com> Message-ID: <XZfwH5OtUeO0O3EKFTF3dKxMftTrL29UNpVHIDgLkio=.c0551eb8-c53f-4fa5-b6de-e457ffc96455@github.com> On Tue, 15 Nov 2022 00:43:16 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - make UsePolyIntrinsics option diagnostic >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db > > src/hotspot/share/opto/library_call.cpp line 6976: > >> 6974: >> 6975: if (!stubAddr) return false; >> 6976: Node* input = argument(1); > > Receiver null check is missing. Since the method being intrinsified is non-static, the intrinsic itself has to take care of receiver null check. I think I found the right code to copy-paste, if you could check again pls. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From coleenp at openjdk.org Tue Nov 15 20:08:16 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Nov 2022 20:08:16 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <LHYAvNfGmeYsnhAtoq3q_6FGQZBQ18pJuOnXJ4QAEgg=.b68dc2d7-5e04-4d22-852c-18a6f62acb12@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file What do you think of this release note ? https://bugs.openjdk.org/browse/JDK-8297073 ------------- PR: https://git.openjdk.org/jdk/pull/11023 From duke at openjdk.org Tue Nov 15 20:09:41 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Nov 2022 20:09:41 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v19] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <n0xQtihBTiXQpZr-I8R9T6yWou774yVTsOaJFxBp9ak=.531509bc-0c30-47f0-8bec-a412d4052664@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: use noreg properly in poly1305_limbs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/58488f42..cbf49380 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=17-18 Stats: 7 lines in 2 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From alanb at openjdk.org Tue Nov 15 20:14:02 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 15 Nov 2022 20:14:02 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> Message-ID: <5hAVImQZUFPiE99N8uZ5KC7YUGHEkYglmftttBxZln4=.d469624e-5ddc-4c13-bc64-ed0c73ceb5a2@github.com> On Tue, 15 Nov 2022 18:28:56 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix failing serviceability tests > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 43: > >> 41: /** >> 42: * A value that is set once and is then available for reading for a bounded period of >> 43: * execution by a thread. A {@code ScopedValue} allows for safely and efficiently sharing > > by a thread, or by one or more threads? (when inherited) I'd prefer to keep this as is because it's too much to use "threads" or mention inheritance in the first paragraph. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Tue Nov 15 20:29:10 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 15 Nov 2022 20:29:10 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> Message-ID: <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> On Tue, 15 Nov 2022 19:19:57 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix failing serviceability tests > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 613: > >> 611: * @return the value of the scoped value if bound, otherwise {@code other} >> 612: */ >> 613: public T orElse(T other) { > > From an API perspective, wouldn't return `Optional` a more consistent choice? `Optional` has all methods we need to deal with this kind of stuff... This comment is on orElse but I suspect you are suggesting that get() be changed to return Optional. I think we'll need to get more feedback/usage of this API before re-visiting that. > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java line 229: > >> 227: * <li> Inheritance of {@linkplain ScopedValue scoped values} across threads. >> 228: * <li> Confinement checks. The phrase "threads contained in the task scope" in method >> 229: * descriptions means threads started in the task scope or descendant scopes. > > sadly, the term "descendant scopes" is not defined elsewhere in this javadoc Good poin,t, we had more setup in previous iterations. @theRealAph I'll adjust this in the loom repo as I can't do it here. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From mcimadamore at openjdk.org Tue Nov 15 21:31:37 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 15 Nov 2022 21:31:37 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> Message-ID: <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> On Tue, 15 Nov 2022 20:26:37 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 613: >> >>> 611: * @return the value of the scoped value if bound, otherwise {@code other} >>> 612: */ >>> 613: public T orElse(T other) { >> >> From an API perspective, wouldn't return `Optional` a more consistent choice? `Optional` has all methods we need to deal with this kind of stuff... > > This comment is on orElse but I suspect you are suggesting that get() be changed to return Optional. I think we'll need to get more feedback/usage of this API before re-visiting that. Yes, my comment was really on `get` - that said, I note that saying get().get() would look odd (but maybe finding some other name for `ScopedValue::get`, such as `find` might work) ------------- PR: https://git.openjdk.org/jdk/pull/10952 From luhenry at openjdk.org Tue Nov 15 22:25:05 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 15 Nov 2022 22:25:05 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <17xVJqTabeiuAn-pEdhiUcdfBuqknPJjqXcVW1eSdWE=.4af5a40b-237b-4ac7-8866-5292d5921754@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11054 From amenkov at openjdk.org Tue Nov 15 22:49:05 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 15 Nov 2022 22:49:05 GMT Subject: RFR: 8296265: Use modern HTML in the JVMTI spec In-Reply-To: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> References: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Message-ID: <fQQ2mzAi7_ZyHY4J6ZEG2fEk-BmxGneAsh1WonME3Jo=.c78e1acc-73c6-4259-b81b-5fa676918355@github.com> On Fri, 11 Nov 2022 00:43:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Changes: > - removed `<b>` from TOC; > - added CSS style for TOC (to simplify customization, currently it's empty); > - removed `<b>` from from function list (per Phase); > - removed `<b>` from from list of events; > - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; > - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); > - dropped duplicate `"b"` transform. [jvmtifiles.zip](https://github.com/openjdk/jdk/files/10016664/jvmtifiles.zip) Unfortunately github does not allow to attach html files, so I zipped old and new jvmti.html and attached zip file. There is no changes in context, only styles ------------- PR: https://git.openjdk.org/jdk/pull/11099 From duke at openjdk.org Tue Nov 15 23:43:12 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 15 Nov 2022 23:43:12 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 [v2] In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <s7LAJAHVZVMuXYargsQcp59cikpEq2BUl_8Jy9rZvSs=.a249ad52-1fcc-49a9-ac89-ae816bf604d0@github.com> > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8296548 - 8296548: Improve MD5 intrinsic for x86_64 The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. This change replaces LEA: r1 = r1 + rsi * 1 + t with ADDs: r1 += t; r1 += rsi. Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. Similar results can also be observed in TestMD5Intrinsics and TestMD5MultiBlockIntrinsics with a more moderate improvement, e.g. ~15% improvement in throughput on Haswell. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11054/files - new: https://git.openjdk.org/jdk/pull/11054/files/6ed4348c..be07b342 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00-01 Stats: 11165 lines in 460 files changed: 4691 ins; 4515 del; 1959 mod Patch: https://git.openjdk.org/jdk/pull/11054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054 PR: https://git.openjdk.org/jdk/pull/11054 From vlivanov at openjdk.org Tue Nov 15 23:56:57 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 23:56:57 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> Message-ID: <QaZ2HofRpsSEDgL-WPjiOtuuk-WMZ7Hvfm7dgNb6OSo=.cd60e9e2-9470-488e-8889-6860b0d33d73@github.com> On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Vladimir's review > - live review with Sandhya > - jcheck > - Sandhya's review > - fix windows and 32b linux builds > - add getLimbs to interface and reviews > - fix 32-bit build > - make UsePolyIntrinsics option diagnostic > - Merge remote-tracking branch 'origin/master' into avx512-poly > - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 370: > 368: // Middle 44-bit limbs of new blocks > 369: __ vpsrlq(L1, L0, 44, Assembler::AVX_512bit); > 370: __ vpsllq(TMP2, TMP1, 20, Assembler::AVX_512bit); Any particular reason to use `TMP2` here? Can you just update `TMP1` instead (w/ `vpsllq(TMP1, TMP1, 20, Assembler::AVX_512bit);`)? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 15 23:56:59 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 23:56:59 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> Message-ID: <6ks_fjBAWGK7eqIki9sA9oWjTOheJR-JAakGUx5t6Ro=.df7278d3-5d28-4219-819f-74c73dfb0677@github.com> On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const Register t1 = r13; >> >> Please, make the temps explicit and lift them into arguments. Otherwise, it's hard to see what registers are clobbered when helper methods are called. > > Thanks for pointing this out.. I spent quite a bit of time and went back and forth on 'register allocation'... it does make sense to pass all the temps needed, when the number of temps is small. This is the case for the three `*_limbs_*` functions. Maybe I should indeed do that... > > On other hand, there are functions like `poly1305_multiply8_avx512` and `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think it makes sense to keep those as 'function-header declarations'. > > Then there are functions like `poly1305_multiply_scalar` that could go either way, has some temps and 'implicitly clobbered' registers, but probably should stay 'as is'.. > > I ended up being 'pedantic' and making _all_ temps into 'header variables'. I also tried to comment, but those probably mean more to me then anyone else in hindsight? > > > // Register Map: > // GPRs: > // input = rdi > // length = rbx > // accumulator = rcx > // R = r8 > // a0 = rsi > // a1 = r9 > // a2 = r10 > // r0 = r11 > // r1 = r12 > // c1 = r8; > // t1 = r13 > // t2 = r14 > // t3 = r15 > // t0 = r14 > // rscratch = r13 > // stack(rsp, rbp) > // imul(rax, rdx) > // ZMMs: > // T: xmm0-6 > // C: xmm7-9 > // A: xmm13-18 > // B: xmm19-24 > // R: xmm25-29 > ... > // Register Map: > // reserved: rsp, rbp, rcx > // PARAMs: rdi, rbx, rsi, r8-r12 > // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx > const Register t0 = r14; > const Register t1 = r13; > const Register rscratch = r13; > > // poly1305_limbs_avx512 clobbers: xmm0, xmm1 > // poly1305_multiply8_avx512 clobbers: xmm0-xmm6 > const XMMRegister T0 = xmm2; > ... > > > I think I am ok changing the `*limbs*` functions (even started, before I remembered my train of thought from months back..) but let me know if you agree with the rest of the reasoning? > On other hand, there are functions like poly1305_multiply8_avx512 and poly1305_process_blocks_avx512 that use a lot of temp registers. I think it makes sense to keep those as 'function-header declarations'. I agree with you on `poly1305_process_blocks_avx512`, but `poly1305_multiply8_avx512` already takes 8 arguments. Putting 8 more arguments for temps doesn't look prohibitive. > I think it makes sense to keep those as 'function-header declarations'. IMO it's not enough. Ideally, if there are any implicit usages, those should be clearly spelled out at every call site. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 15 23:57:00 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 23:57:00 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <9p2RTAI9FPWstQu0OtpSmSB7dqhFwmxbw86zZQg4GtU=.1be10660-ee5d-4654-9d4e-4fe3e449fd9b@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> <OfSiSh0ho8oFGm5lOgCGxw84XjHzQHdXvyDOcZBQBXo=.f243574a-c6aa-47b2-8bb1-7951232bf83d@github.com> <9p2RTAI9FPWstQu0OtpSmSB7dqhFwmxbw86zZQg4GtU=.1be10660-ee5d-4654-9d4e-4fe3e449fd9b@github.com> Message-ID: <jOU2YrKbL5IN9drxis9yPT_AlliER47yfWT82oCm8_g=.92f9a11a-4f49-4897-974d-f77c5d4b6a21@github.com> On Tue, 15 Nov 2022 19:38:26 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse rather than help: >> >> // void processBlocks(byte[] input, int len, int[5] a, int[5] r) >> const Register input = rdi; //input+offset >> const Register length = rbx; >> const Register accumulator = rcx; >> const Register R = r8; > > Added a comment, hopefully less confusing. On a second thought, passing derived pointers as arguments doesn't mix well with safepoint awareness. (And this stub eventually has to become safepoint aware.) Deriving a pointer inside the stub from a base oop and offset is trivial, recovering base oop from derived pointer is hard. It doesn't mean we have to address it right now. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 15 23:57:04 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Nov 2022 23:57:04 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17] In-Reply-To: <7hXP-vwxc6J7fklu8QuJqiIcSQRff-QyR1SZ0Fzfqmc=.33a38a51-38c3-451a-a756-ed538507f04e@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <7hXP-vwxc6J7fklu8QuJqiIcSQRff-QyR1SZ0Fzfqmc=.33a38a51-38c3-451a-a756-ed538507f04e@github.com> Message-ID: <fw96wWvrsbFqCZc16QUimZ6Cg0OhKARsTY-oeRbfT-I=.5bc6a1e2-d99d-4784-9b41-d4722793c613@github.com> On Tue, 15 Nov 2022 19:43:11 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Vladimir's review comments > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Merge remote-tracking branch 'origin/master' into avx512-poly > - Vladimir's review > - live review with Sandhya > - jcheck > - Sandhya's review > - fix windows and 32b linux builds > - add getLimbs to interface and reviews > - fix 32-bit build > - ... and 15 more: https://git.openjdk.org/jdk/compare/7357a1a3...8f5942d9 src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 896: > 894: > 895: // Cleanup > 896: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); What's the purpose of the cleanup? src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 1004: > 1002: __ jcc(Assembler::less, L_process16Loop); > 1003: > 1004: poly1305_process_blocks_avx512(input, length, I'd like to see a comment here explaining what register effects are implicit. `poly1305_process_blocks_avx512` has the following comment, but it doesn't mention xmm registers: // Register Map: // reserved: rsp, rbp, rcx // PARAMs: rdi, rbx, rsi, r8-r12 // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 00:08:07 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 00:08:07 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17] In-Reply-To: <fw96wWvrsbFqCZc16QUimZ6Cg0OhKARsTY-oeRbfT-I=.5bc6a1e2-d99d-4784-9b41-d4722793c613@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <7hXP-vwxc6J7fklu8QuJqiIcSQRff-QyR1SZ0Fzfqmc=.33a38a51-38c3-451a-a756-ed538507f04e@github.com> <fw96wWvrsbFqCZc16QUimZ6Cg0OhKARsTY-oeRbfT-I=.5bc6a1e2-d99d-4784-9b41-d4722793c613@github.com> Message-ID: <R5Z9dm4TLlM2Pgxg5_UDBgYL3IG83Ec6KGCQ3ei_qdw=.2ee65e24-aaec-4266-8378-5a109de4dbbc@github.com> On Tue, 15 Nov 2022 19:41:25 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7357a1a3...8f5942d9 > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 896: > >> 894: >> 895: // Cleanup >> 896: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); > > What's the purpose of the cleanup? The internal security review asked me to blank out all the key material after I am done. i.e. R (and its powers on the stack) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Wed Nov 16 00:56:55 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 16 Nov 2022 00:56:55 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v2] In-Reply-To: <mItU3V6T_c6Zj2oYscFfV1dQZuzLLpN5JeXDNhpu-RU=.2be2d6aa-62c9-44c0-9529-5c434608fe3d@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <mItU3V6T_c6Zj2oYscFfV1dQZuzLLpN5JeXDNhpu-RU=.2be2d6aa-62c9-44c0-9529-5c434608fe3d@github.com> Message-ID: <RiwetW0o1QAgS2IXXl6dC6OnnaeDm3xtCWQcuo9gMB0=.7236a613-7d3c-434d-b507-193813492175@github.com> On Tue, 15 Nov 2022 07:07:11 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Turn off the default true UseRVA20U64 when hardware does not support C Updated change looks good. Thanks ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11155 From fyang at openjdk.org Wed Nov 16 00:57:53 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 16 Nov 2022 00:57:53 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <cUd4rSzfjzK8ol3rCs1fRMf-Lctmtoau_1yNS3NIyyM=.ed3a65d4-8c85-48f7-9810-e6d5a218e944@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> <cUd4rSzfjzK8ol3rCs1fRMf-Lctmtoau_1yNS3NIyyM=.ed3a65d4-8c85-48f7-9810-e6d5a218e944@github.com> Message-ID: <CJh69191q-h9mmX4--t9UFgxKIbgv6NZf7qHQgww31Y=.97e94a0f-9642-46ac-93c2-89ab0e0e071c@github.com> On Mon, 14 Nov 2022 02:59:38 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > Change looks good. @feilongjiang @yadongw : Thanks for the review! Need a Review then. @shipilev : Want to take a look? ------------- PR: https://git.openjdk.org/jdk/pull/11130 From sspitsyn at openjdk.org Wed Nov 16 01:21:54 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 16 Nov 2022 01:21:54 GMT Subject: RFR: 8296265: Use modern HTML in the JVMTI spec In-Reply-To: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> References: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Message-ID: <jr93FcQEx1s74y4Z4MhxOVZaOu93Ea30OUASNJm9zMw=.cea5f160-f174-495b-bdb2-c6dea40230c1@github.com> On Fri, 11 Nov 2022 00:43:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Changes: > - removed `<b>` from TOC; > - added CSS style for TOC (to simplify customization, currently it's empty); > - removed `<b>` from from function list (per Phase); > - removed `<b>` from from list of events; > - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; > - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); > - dropped duplicate `"b"` transform. Thank you for generated old and new jvmti.html. ------------- PR: https://git.openjdk.org/jdk/pull/11099 From kvn at openjdk.org Wed Nov 16 02:22:58 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 16 Nov 2022 02:22:58 GMT Subject: RFR: 8296548: Improve MD5 intrinsic for x86_64 [v2] In-Reply-To: <s7LAJAHVZVMuXYargsQcp59cikpEq2BUl_8Jy9rZvSs=.a249ad52-1fcc-49a9-ac89-ae816bf604d0@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> <s7LAJAHVZVMuXYargsQcp59cikpEq2BUl_8Jy9rZvSs=.a249ad52-1fcc-49a9-ac89-ae816bf604d0@github.com> Message-ID: <-u2SvidKHdlnJvslxVaY27GgjVxQyHnRBkZwiJ08nwo=.d0562bb7-51fa-45b9-a085-0cb69d666f2c@github.com> On Tue, 15 Nov 2022 23:43:12 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: >> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. >> >> This change replaces >> LEA: r1 = r1 + rsi * 1 + t >> with >> ADDs: r1 += t; r1 += rsi. >> >> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. >> >> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. >> >> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. > > Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8296548 > - 8296548: Improve MD5 intrinsic for x86_64 > > The LEA instruction loads the effective address, but MD5 intrinsic uses > it for computing values than addresses. This usage potentially uses > more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, > Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd > gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd > gen Epyc. > > Similar results can also be observed in TestMD5Intrinsics and > TestMD5MultiBlockIntrinsics with a more moderate improvement, e.g. ~15% > improvement in throughput on Haswell. My testing passed. ------------- PR: https://git.openjdk.org/jdk/pull/11054 From sspitsyn at openjdk.org Wed Nov 16 03:58:30 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 16 Nov 2022 03:58:30 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> Message-ID: <HeTLbgd9vgQNCDQ4k7CVyv4xd8dJF8H9g0APM3GOXr4=.629511e6-bb14-4bd7-b8bd-f0f34dd044ea@github.com> On Tue, 15 Nov 2022 18:52:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. I'm sorry for the latency. Looks good to me. I've posted one nit though. Thanks, Serguei src/hotspot/share/prims/jvmtiEnvBase.cpp line 564: > 562: > 563: for (int i = 0; i < length; i++) { > 564: objArray[i] = (jthreadGroup)JNIHandles::make_local(groups->obj_at(i)); Nit: It is better to use `jni_reference` instead of `JNIHandles::make_local` for consistency as at the line 549. ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11033 From kbarrett at openjdk.org Wed Nov 16 04:58:19 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 16 Nov 2022 04:58:19 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> Message-ID: <2Ja9uaGY95zatV-viOoTkNbakqLkWuCDn761HztySZU=.f42b4b8a-2be1-41b1-98c0-12f953e6a88b@github.com> On Tue, 15 Nov 2022 08:31:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> src/hotspot/os/bsd/attachListener_bsd.cpp line 294: >> >>> 292: (atoi(buf) != ATTACH_PROTOCOL_VER)) { >>> 293: char msg[32]; >>> 294: os::snprintf(msg, sizeof(msg), "%d\n", ATTACH_ERROR_BADVERSION); >> >> Rather than using `strlen(msg)` in the next line, use the result from `os::snprintf`. > > The problem with using the return value of os::snprintf() is that we need to handle the -1 case to prevent the position from running backward. Might be better to use stringStream instead, which should handle the -1 case transparently. A result of -1 only occurs for an encoding error. An encoding error is only possible with multi-byte / wide characters. (See the definition of "encoding error" in C99 7.19.3/14.) We don't use those, so there won't be any encoding errors, so our uses of snprintf never return -1. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xlinzheng at openjdk.org Wed Nov 16 05:21:52 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 16 Nov 2022 05:21:52 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> Message-ID: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> > The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. > > >> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" > bool UseRVA20U64 = true {ARCH product} {default} > bool UseRVC = true {ARCH product} {default} > openjdk version "20-internal" 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html > [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 > > Thanks, > Xiaolin Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11155/files - new: https://git.openjdk.org/jdk/pull/11155/files/55e3dbe7..79f856ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11155&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11155&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11155.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11155/head:pull/11155 PR: https://git.openjdk.org/jdk/pull/11155 From duke at openjdk.org Wed Nov 16 06:16:55 2022 From: duke at openjdk.org (Yi-Fan Tsai) Date: Wed, 16 Nov 2022 06:16:55 GMT Subject: Integrated: 8296548: Improve MD5 intrinsic for x86_64 In-Reply-To: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> References: <UAjNUhH37EIqreJS3T2JOec47dnUdrVUeyALrWQkX5g=.a705e044-1fea-4467-b78c-ba446d01f11f@github.com> Message-ID: <2AnMa7kndLRDNh88RgZxmTHEHLg6tdhxPJ3WYO8ARBE=.9f962915-8fb8-4907-be2b-9e7f530fa493@github.com> On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote: > The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput. > > This change replaces > LEA: r1 = r1 + rsi * 1 + t > with > ADDs: r1 += t; r1 += rsi. > > Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc. > > No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc. > > Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake. This pull request has now been integrated. Changeset: 6ead2b01 Author: Yi-Fan Tsai <yifan.tsai at gmail.com> Committer: Jatin Bhateja <jbhateja at openjdk.org> URL: https://git.openjdk.org/jdk/commit/6ead2b019595f9b54a70603da84f11271ee070b6 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod 8296548: Improve MD5 intrinsic for x86_64 Reviewed-by: kvn, sviswanathan, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/11054 From duke at openjdk.org Wed Nov 16 06:34:07 2022 From: duke at openjdk.org (zzambers) Date: Wed, 16 Nov 2022 06:34:07 GMT Subject: Integrated: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> References: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> Message-ID: <4IZfuq0OUU6CFZjPHEaPCy-CjhRsXJQyXLmA1rg4hgA=.af884463-64e4-44e9-945c-802114f34125@github.com> On Wed, 26 Oct 2022 16:43:26 GMT, zzambers <duke at openjdk.org> wrote: > Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. This pull request has now been integrated. Changeset: 3f2f128a Author: Zdenek Zambersky <zzambers at redhat.com> Committer: Tobias Hartmann <thartmann at openjdk.org> URL: https://git.openjdk.org/jdk/commit/3f2f128af6ec2f9097af7758bfd41aeaa4354d40 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod 8295952: Problemlist existing compiler/rtm tests also on x86 Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10875 From xuelei at openjdk.org Wed Nov 16 07:03:12 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 16 Nov 2022 07:03:12 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v7] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <lTTxdskiTr0w5EmZ0xKWQlSBCfhePMpHV18cpzWh_pE=.2da73c6b-cd06-4a9b-89ba-213fc10cb8f5@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/ca4ddcc4..f2158c8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=05-06 Stats: 24 lines in 6 files changed: 1 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Wed Nov 16 07:16:00 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 16 Nov 2022 07:16:00 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <nz9IPjZuX8QnocI6yC68IrCccWJHNZ4pupJJniO7hkE=.c224740b-a81b-4804-8d3a-ce98ed8e87f4@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <nz9IPjZuX8QnocI6yC68IrCccWJHNZ4pupJJniO7hkE=.c224740b-a81b-4804-8d3a-ce98ed8e87f4@github.com> Message-ID: <Zidix36ENoGghAxz0b-yEA1WhVpRur9Sx8t13agCPQc=.7ec1a812-2eea-4c22-8693-e2981d0de263@github.com> On Tue, 15 Nov 2022 05:52:18 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> delete swp file > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 226: > >> 224: char buf[512]; >> 225: os::snprintf(buf, sizeof(buf), "0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); >> 226: if (_model2) os::snprintf(buf+strlen(buf), sizeof(buf) - strlen(buf), "(0x%03x)", _model2); > > Here - and in several other places, where we construct a string from multiple parts - the code would be a simpler with `stringStream`: > > > char buf[512]; > stringStream ss(buf, sizeof(buf)); > ss.print("0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); > if (_model2) ss.print("(0x%03x)", _model2); > _features_string = os::strdup(buf); > > or, using `stringStream`s internal buffer: > > > stringStream ss; > ss.print("0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); > if (_model2) ss.print("(0x%03x)", _model2); > _features_string = ss.base(); > > > No manual offset counting required. > > I leave it up to you if you do it that way. The code here is correct as it is. Glad to know that `stringStream` could make this block safer and easier. I learned a lot from the reviewers of this PR. It looks like we can also use stringStream for other files touched in this PR. I would like to keep the update focusing on the simple replacing of `sprintf`. I may file a new PR for the improvement by using `stringStream` shortly after. > src/hotspot/share/classfile/javaClasses.cpp line 2562: > >> 2560: CompiledMethod* nm = method->code(); >> 2561: if (WizardMode && nm != NULL) { >> 2562: os::snprintf(buf + buf_off, buf_size - buf_off, "(nmethod " INTPTR_FORMAT ")", (intptr_t)nm); > > I think you should update `buf_off` here, because now you overwrite the last text part. Weird that no test caught that. > > All this code here in javaClasses.cpp would benefit from using stringStream. Oooops! Good catch. `buf_off` is updated in the new commit. I think `stringStream` could be a better solution. I will do it in a follow-up PR. > src/hotspot/share/utilities/utf8.cpp line 521: > >> 519: } else { >> 520: if (p + 6 >= end) break; // string is truncated >> 521: os::snprintf(p, 7, "\\u%04x", c); > > This should be 6, or? We have 6 characters left before end, assuming end is exclusive. > > Also, maybe use a named constant? If 6 is used, there is a output truncated warning and only 5 characters are filed actually. The terminating null/zero is counted in, I think. To make it easier to read, I added a comment. > src/java.desktop/macosx/native/libjsound/PLATFORM_API_MacOSX_Ports.cpp line 638: > >> 636: return; >> 637: } >> 638: snprintf(channelName, 16, "Ch %d", ch); > > Can we use a constant here instead of literal 16? To be honest, I don't know the logic of the code yet. I'm not sure how to name the literal 16 yet. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Wed Nov 16 07:16:02 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 16 Nov 2022 07:16:02 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <2Ja9uaGY95zatV-viOoTkNbakqLkWuCDn761HztySZU=.f42b4b8a-2be1-41b1-98c0-12f953e6a88b@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> <2Ja9uaGY95zatV-viOoTkNbakqLkWuCDn761HztySZU=.f42b4b8a-2be1-41b1-98c0-12f953e6a88b@github.com> Message-ID: <HyLZBgVxonD9zj8kI96k0aZnWSO9Yk_-cxmAVuW0ZNM=.04766c2b-469e-424d-af7d-d2c428181bc3@github.com> On Wed, 16 Nov 2022 04:55:17 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> The problem with using the return value of os::snprintf() is that we need to handle the -1 case to prevent the position from running backward. Might be better to use stringStream instead, which should handle the -1 case transparently. > > A result of -1 only occurs for an encoding error. An encoding error is only > possible with multi-byte / wide characters. (See the definition of "encoding > error" in C99 7.19.3/14.) We don't use those, so there won't be any encoding > errors, so our uses of snprintf never return -1. Updated to use the result from `os::snprtinf` in the new commit. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Wed Nov 16 07:16:02 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 16 Nov 2022 07:16:02 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> Message-ID: <UEwLnUsMZkahUP1vUzNE76zEe-LPIM94coD5euVwwMM=.a6e8138f-6d58-41d5-9b9b-80c30a8a0340@github.com> On Tue, 15 Nov 2022 07:04:38 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> delete swp file > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 226: > >> 224: char buf[512]; >> 225: os::snprintf(buf, sizeof(buf), "0x%02x:0x%x:0x%03x:%d", _cpu, _variant, _model, _revision); >> 226: if (_model2) os::snprintf(buf+strlen(buf), sizeof(buf) - strlen(buf), "(0x%03x)", _model2); > > Instead of using `strlen(buf)` (now called twice!) to get the number of characters written, use the result of the first call to `os::snprintf`. Good point! Updated in the new commit. > src/hotspot/os/bsd/attachListener_bsd.cpp line 251: > >> 249: BsdAttachOperation* BsdAttachListener::read_request(int s) { >> 250: char ver_str[8]; >> 251: os::snprintf(ver_str, sizeof(ver_str), "%d", ATTACH_PROTOCOL_VER); > > We later use `strlen(ver_str)` where we could instead use the result of `os::snprintf`. I think it is safe to use the result of `os::snprintf` for the computation of max_len of the buf, isn't it? - const int max_len = (sizeof(ver_str) + 1) + (AttachOperation::name_length_max + 1) + + const int max_len = (ver_str_len + 1) + (AttachOperation::name_length_max + 1) + AttachOperation::arg_count_max*(AttachOperation::arg_length_max + 1); > src/hotspot/os/bsd/attachListener_bsd.cpp line 414: > >> 412: // write operation result >> 413: char msg[32]; >> 414: os::snprintf(msg, sizeof(msg), "%d\n", result); > > Rather than using strlen(msg) in the next line, use the result from os::snprintf. Updated to use the result from os::snprtinf in the new commit. > src/hotspot/share/classfile/javaClasses.cpp line 2532: > >> 2530: // Print module information >> 2531: if (module_name != NULL) { >> 2532: buf_off = (int)strlen(buf); > > `buf_off` could be the result of `os::snprintf` instead of calling `strlen`. Updated to use the result of `os::snprintf` in the new commit. > src/hotspot/share/code/dependencies.cpp line 780: > >> 778: } >> 779: } else { >> 780: char xn[12]; os::snprintf(xn, sizeof(xn), "x%d", j); > > Pre-existing very unusual formatting; put a line break between the statements. Yes. Updated in the new commit. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Wed Nov 16 07:16:03 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 16 Nov 2022 07:16:03 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v4] In-Reply-To: <74-iTHarZs4dtXp8dqJEKYrXRw24WAQHrUorBJ4Tmvc=.e2bb3e95-c538-42cb-94a6-6d3378d5bdab@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <TwmQmg7Canmom_CSvAthOQIbBZaMPLXLfgaudOsoZD0=.bfb6f137-df79-40cf-b6d6-89b975832d66@github.com> <74-iTHarZs4dtXp8dqJEKYrXRw24WAQHrUorBJ4Tmvc=.e2bb3e95-c538-42cb-94a6-6d3378d5bdab@github.com> Message-ID: <IasO0sh7W6M_8SxQsmirQ3sQooOpUhzrzjIOUDBZ9u8=.0c045a5a-d4fe-4de5-9583-e1e5f92644e7@github.com> On Mon, 14 Nov 2022 16:53:07 GMT, Lutz Schmidt <lucy at openjdk.org> wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> include missing os head file > > src/hotspot/share/adlc/output_c.cpp line 536: > >> 534: int printed = snprintf(args, 37, "0x%x, 0x%x, %u", >> 535: resources_used, resources_used_exclusively, element_count); >> 536: assert(printed <= 36, "overflow"); > > if snprintf works correctly (we rely on that), this assert will never fire. Good point. I removed the assert in the new commit. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From shade at openjdk.org Wed Nov 16 07:39:59 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Nov 2022 07:39:59 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> Message-ID: <vFrz5OWLa1nQv_kMedGi3abjJ1U3AEn4UiieFcOicdU=.f49a2a23-6326-473d-8363-6b7f2d196724@github.com> On Mon, 14 Nov 2022 02:58:20 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. >> These are small functions which mostly contain only a single line of code. We should move them to the >> corresponding header file so that they have a chance to be inlined. >> >> Testing: Tier1 on linux-riscv64 HiFive unmatched board. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Review I would have expected these to be moved to `macroAssembler_riscv.inline.hpp`, to be honest. But this is okay as well. ------------- PR: https://git.openjdk.org/jdk/pull/11130Marked as reviewed by shade (Reviewer). From aturbanov at openjdk.org Wed Nov 16 08:18:11 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 16 Nov 2022 08:18:11 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v3] In-Reply-To: <W6A7cfHKo0SZoEVlXxUDutY4t10ufyOxH81t_d8-9ig=.49d9813d-5f28-4986-8ffa-eb2fe1027d8a@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <W6A7cfHKo0SZoEVlXxUDutY4t10ufyOxH81t_d8-9ig=.49d9813d-5f28-4986-8ffa-eb2fe1027d8a@github.com> Message-ID: <ItBPv6uMQa7SRXIQjcyOxn5al7vF2QkP9SIpVV6ggnY=.0bdff0b8-05d3-4025-bd52-3a88f3f68148@github.com> On Tue, 15 Nov 2022 12:52:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. >> >> The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 35: > 33: * Given the output of a java VM that crashed, extract the name of the hs-err file from the output > 34: */ > 35: static public String extractHsErrFileNameFromOutput(OutputAnalyzer output) { nit: let's use blessed modifiers order Suggestion: public static String extractHsErrFileNameFromOutput(OutputAnalyzer output) { test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 94: > 92: if (currentPattern < patterns.length) { > 93: throw new RuntimeException("hs-err file incomplete (found " + currentPattern + " matching pattern, " + > 94: "first missing pattern: " + patterns[currentPattern] + ")"); nit Suggestion: "first missing pattern: " + patterns[currentPattern] + ")"); ------------- PR: https://git.openjdk.org/jdk/pull/11122 From luhenry at openjdk.org Wed Nov 16 08:20:09 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 16 Nov 2022 08:20:09 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <uTgtSaEzdKXgGGYoJLpJ2tYhDhuCg01OaiUSc5QnMvA=.d9a1a35a-efc5-47d4-bb40-a5fc557782f7@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported Given the RVC is a _mandatory_ extension of `RVA20U64`, but also that `RVA20U64` can imply other extensions ([full list of mandatory extensions](https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#512-rva20u64-mandatory-extensions)), I wouldn't just disable `RVA20U64` when RVC isn't available, but I would fail completely the JVM. Getting in such case would mean that a hardware is implementing only parts of the __mandatory__ list of extensions for a given profile. I understand that it puts the approach of this PR in question though, and I'm wondering why we need to enable a whole profile when we are trying to enable only one of the feature (RVC) and we also have feature flag detection for this feature (`_features & CPU_C`). It would just be better IMO to do proper feature detection and use that to enable/disable features, just like other platforms. ------------- PR: https://git.openjdk.org/jdk/pull/11155 From ihse at openjdk.org Wed Nov 16 08:31:56 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Nov 2022 08:31:56 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <OG6PBKDNftaAF-ybH8jebIjEs5qcomgL90FAcuxg5AU=.640a50d1-3617-4c86-8959-491226b5346a@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <OG6PBKDNftaAF-ybH8jebIjEs5qcomgL90FAcuxg5AU=.640a50d1-3617-4c86-8959-491226b5346a@github.com> Message-ID: <EBoNLji14NTOMbkxq2Ur7yHtoz1JcGDcHi7eKrCjgmY=.7df07f07-63c3-42b0-9cff-d4b509d49109@github.com> On Tue, 18 Oct 2022 15:21:58 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Always read full filename and strip prefix path and only then cut filename to fit output buffer >> - Merge branch 'master' into JDK-8293422 >> - Merge branch 'master' into JDK-8293422 >> - Review comments from Thomas >> - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ >> - 8293422: DWARF emitted by Clang cannot be parsed > > Thanks Magnus for your review of the build changes! > > May a get a second review of the DWARF parser code changes? > > Thanks, > Christian @chhagedorn Are you waiting for an additional Hotspot review? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From shade at openjdk.org Wed Nov 16 08:32:02 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Nov 2022 08:32:02 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v6] In-Reply-To: <YFUqNAFHWeIeZjced_rpvGUTPuEr0vnWkSgcyyB7kL4=.c89e3fb4-5122-4ed3-88c2-7215ce3a6620@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> <YFUqNAFHWeIeZjced_rpvGUTPuEr0vnWkSgcyyB7kL4=.c89e3fb4-5122-4ed3-88c2-7215ce3a6620@github.com> Message-ID: <4zL45U4twO112SLW6jqQJyRWlhAmYI7HUB1EhIKl7UQ=.1da3aac9-6118-4a23-a47f-550cdc20b530@github.com> On Tue, 15 Nov 2022 14:26:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable > - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable > - Fix build failures > - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable > - Also disable warnings in gtests > - Fix Remerged, retested. Unless there are other comments, I'll integrate this. ------------- PR: https://git.openjdk.org/jdk/pull/10493 From stuefe at openjdk.org Wed Nov 16 08:50:10 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 08:50:10 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v4] In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <bPB4PgPSiG_14IpRKCrO3QnWrIdztrOgxoycffG-9wg=.063eb16c-f6f1-4528-a43e-e17364795148@github.com> > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Remove trailing whitespace Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> - blessed modifiers Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11122/files - new: https://git.openjdk.org/jdk/pull/11122/files/a931ea8b..a72d834a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11122.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11122/head:pull/11122 PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Wed Nov 16 08:50:11 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 08:50:11 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v3] In-Reply-To: <ItBPv6uMQa7SRXIQjcyOxn5al7vF2QkP9SIpVV6ggnY=.0bdff0b8-05d3-4025-bd52-3a88f3f68148@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <W6A7cfHKo0SZoEVlXxUDutY4t10ufyOxH81t_d8-9ig=.49d9813d-5f28-4986-8ffa-eb2fe1027d8a@github.com> <ItBPv6uMQa7SRXIQjcyOxn5al7vF2QkP9SIpVV6ggnY=.0bdff0b8-05d3-4025-bd52-3a88f3f68148@github.com> Message-ID: <fZk-diyxDX9KODrL7ifQDfShXPrubsZ6oud67M3oFIs=.e2bce418-d767-476c-837c-f956304c4d69@github.com> On Wed, 16 Nov 2022 08:13:38 GMT, Andrey Turbanov <aturbanov at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright > > test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 35: > >> 33: * Given the output of a java VM that crashed, extract the name of the hs-err file from the output >> 34: */ >> 35: static public String extractHsErrFileNameFromOutput(OutputAnalyzer output) { > > nit: let's use blessed modifiers order > Suggestion: > > public static String extractHsErrFileNameFromOutput(OutputAnalyzer output) { I had to look this up :-) Sure. ------------- PR: https://git.openjdk.org/jdk/pull/11122 From chagedorn at openjdk.org Wed Nov 16 08:51:04 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Nov 2022 08:51:04 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> Message-ID: <ZOj-Cb86tjWGXWJK8AQx4EB81bcerOGONb4P6vkPZkw=.0fb827e5-8c50-4174-a0fd-4f1142b8d4f2@github.com> On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Yes, I think it would be good to get a second review of the DWARF parser changes. Maybe @tstuefe? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From stuefe at openjdk.org Wed Nov 16 08:51:04 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 08:51:04 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <ZOj-Cb86tjWGXWJK8AQx4EB81bcerOGONb4P6vkPZkw=.0fb827e5-8c50-4174-a0fd-4f1142b8d4f2@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <ZOj-Cb86tjWGXWJK8AQx4EB81bcerOGONb4P6vkPZkw=.0fb827e5-8c50-4174-a0fd-4f1142b8d4f2@github.com> Message-ID: <ofIMTnRPIIqTzVGjaaaBedoelHOGrD3aKbEi79M7RT8=.655c1544-0c68-4c21-90e1-38c276bb731e@github.com> On Wed, 16 Nov 2022 08:46:15 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: > Yes, I think it would be good to get a second review of the DWARF parser changes. Maybe @tstuefe? I'm a bit swamped, but try to take a look later today. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From stuefe at openjdk.org Wed Nov 16 08:53:01 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 08:53:01 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <-INi0NwqX4R4hkPwr5tCV13_vIEmJ6gq8KrEcnVx1bI=.b6282c57-5cd9-411c-bf23-7ace4d639ba4@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). No feedback from David. Since this matter is a bit pressing - affects runtime of our GHAs and internal tests - I'll commit now. ------------- PR: https://git.openjdk.org/jdk/pull/11135 From stuefe at openjdk.org Wed Nov 16 08:54:29 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 08:54:29 GMT Subject: Integrated: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <OOEXeqvV9N_gjFIY36bJuamUa4HfhQmYap0raMYnN7M=.008a78e3-ea32-406c-8f16-85404a6dcb8e@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). This pull request has now been integrated. Changeset: 5e08b3f4 Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5e08b3f40e04254276fc2d37c523cb06b121861a Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod 8296931: NMT tests slowed down considerably by JDK-8242181 Reviewed-by: chagedorn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/11135 From xlinzheng at openjdk.org Wed Nov 16 08:55:02 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 16 Nov 2022 08:55:02 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <U1NpJvngJSNo4l_0Gqau-_FUz2TyitSKElJtgzl7-Gc=.aa9e8196-cf58-454e-8eb8-c8a558300257@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported > Given the RVC is a _mandatory_ extension of `RVA20U64`, but also that `RVA20U64` can imply other extensions ([full list of mandatory extensions](https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#512-rva20u64-mandatory-extensions)), I wouldn't just disable `RVA20U64` when RVC isn't available, but I would fail completely the JVM. Getting in such case would mean that a hardware is implementing only parts of the **mandatory** list of extensions for a given profile. > > I understand that it puts the approach of this PR in question though, and I'm wondering why we need to enable a whole profile when we are trying to enable only one of the feature (RVC) and we also have feature flag detection for this feature (`_features & CPU_C`). It would just be better IMO to do proper feature detection and use that to enable/disable features, just like other platforms. Do you mean by directly using a solution like [1] without touching `UseRVA20U64`? I think I am okay with any case. [1] https://github.com/openjdk/jdk/pull/11155/commits/b5b9c64529c27c40542f8cda720652fabf70682d ------------- PR: https://git.openjdk.org/jdk/pull/11155 From lucy at openjdk.org Wed Nov 16 09:05:51 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 16 Nov 2022 09:05:51 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <Zidix36ENoGghAxz0b-yEA1WhVpRur9Sx8t13agCPQc=.7ec1a812-2eea-4c22-8693-e2981d0de263@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <nz9IPjZuX8QnocI6yC68IrCccWJHNZ4pupJJniO7hkE=.c224740b-a81b-4804-8d3a-ce98ed8e87f4@github.com> <Zidix36ENoGghAxz0b-yEA1WhVpRur9Sx8t13agCPQc=.7ec1a812-2eea-4c22-8693-e2981d0de263@github.com> Message-ID: <CbqGt6e1j8_OqIfupugWJxHsZY0AL4oD1nvKt458APw=.75b6366f-4a8a-4091-b9ff-c1c140147181@github.com> On Wed, 16 Nov 2022 06:43:29 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> src/hotspot/share/utilities/utf8.cpp line 521: >> >>> 519: } else { >>> 520: if (p + 6 >= end) break; // string is truncated >>> 521: os::snprintf(p, 7, "\\u%04x", c); >> >> This should be 6, or? We have 6 characters left before end, assuming end is exclusive. >> >> Also, maybe use a named constant? > > If 6 is used, there is a output truncated warning and only 5 characters are filed actually. The terminating null/zero is counted in, I think. To make it easier to read, I added a comment. For snprintf, all bytes written to the buffer (including the terminating \0) are counted. You have 6 bytes for character encoding ("\uxxxx") and one byte for \0. Code is correct. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From chagedorn at openjdk.org Wed Nov 16 09:07:07 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Nov 2022 09:07:07 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> Message-ID: <KsZ_zoL91I2XpA4CGT5-wtuPi4CYfFn3LlAPWOpAOrM=.8a304e68-1e28-4f3c-b0b8-4a24377a1bab@github.com> On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Thank you Thomas! Take your time, there is no hurry. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From stuefe at openjdk.org Wed Nov 16 09:10:01 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 09:10:01 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <HyLZBgVxonD9zj8kI96k0aZnWSO9Yk_-cxmAVuW0ZNM=.04766c2b-469e-424d-af7d-d2c428181bc3@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> <2Ja9uaGY95zatV-viOoTkNbakqLkWuCDn761HztySZU=.f42b4b8a-2be1-41b1-98c0-12f953e6a88b@github.com> <HyLZBgVxonD9zj8kI96k0aZnWSO9Yk_-cxmAVuW0ZNM=.04766c2b-469e-424d-af7d-d2c428181bc3@github.com> Message-ID: <cthdqge57PbbrQVd46zbRS9fhxqeHV_vM8GGEsmlrYA=.ed878d6f-0df1-40cf-b24c-90b5d8d305c7@github.com> On Wed, 16 Nov 2022 05:45:34 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> A result of -1 only occurs for an encoding error. An encoding error is only >> possible with multi-byte / wide characters. (See the definition of "encoding >> error" in C99 7.19.3/14.) We don't use those, so there won't be any encoding >> errors, so our uses of snprintf never return -1. > > Updated to use the result from `os::snprtinf` in the new commit. > A result of -1 only occurs for an encoding error. An encoding error is only possible with multi-byte / wide characters. (See the definition of "encoding error" in C99 7.19.3/14.) We don't use those, so there won't be any encoding errors, so our uses of snprintf never return -1. Hi @kimbarrett, I am not sure this was true. E.g. https://stackoverflow.com/questions/65334245/what-is-an-encoding-error-for-sprintf-that-should-return-1 cites some cases where snprintf returns -1 that have nothing to do with multibyte strings. Also, size=0 would return -1 according to SUSv2. Note glibc differs and returns the number of chars it *would* have printed. Which is also dangerous in a different way. If you use that number to update the position, the position is not limited to buffer boundaries. So, I think the result of os::snprintf should not be used to update buffer position, at least not without checking. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stefank at openjdk.org Wed Nov 16 09:19:01 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 09:19:01 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v8] In-Reply-To: <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> Message-ID: <4XP5mF36J2i2ftjKr21gkDhQhJXtNdpID7wcLVd6IP8=.6fbe9884-cae6-4c09-8aa8-e76162e2133a@github.com> On Tue, 15 Nov 2022 11:08:15 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. >> >> I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. >> >> This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. >> >> Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Spelling Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Wed Nov 16 09:19:01 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 09:19:01 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack In-Reply-To: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> Message-ID: <cizHOCYHVSXpXJn9Am_yfqmtpLv8S11dP73tKytobys=.a547697a-536d-4299-9d74-b15a45cbfcef@github.com> On Tue, 15 Nov 2022 11:09:56 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11161 From stuefe at openjdk.org Wed Nov 16 09:44:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 09:44:08 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack In-Reply-To: <cizHOCYHVSXpXJn9Am_yfqmtpLv8S11dP73tKytobys=.a547697a-536d-4299-9d74-b15a45cbfcef@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> <cizHOCYHVSXpXJn9Am_yfqmtpLv8S11dP73tKytobys=.a547697a-536d-4299-9d74-b15a45cbfcef@github.com> Message-ID: <I27s6Q-dVi2u3PL0lbcX3pOnwTC_L_-jU0HoW3IOlsg=.f4b9bf6b-1388-409e-8d15-afa8fad36527@github.com> On Wed, 16 Nov 2022 09:15:15 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Thanks for the reviews! Thanks for doing all this! ------------- PR: https://git.openjdk.org/jdk/pull/11161 From lkorinth at openjdk.org Wed Nov 16 09:44:17 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 16 Nov 2022 09:44:17 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v3] In-Reply-To: <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> Message-ID: <3AuxcusThPRD55gEUaEaEdTFjRuTnRc_fyM87mv1msY=.d06fcb4c-85d3-4793-bd5a-adee06d34d14@github.com> On Tue, 15 Nov 2022 10:11:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). >> >> For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. >> >> When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. >> >> This is what the constructors look like before the patch: >> >> CHeapBitMap() : CHeapBitMap(mtInternal) {} >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); >> >> >> And I'd like to change it to: >> >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); >> >> >> In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. >> >> We could probably condense this even further into just one constructor: >> >> explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} >> >> >> given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Align parameter order with GrowableArray changes Looks good, thanks for making it harder to do the wrong thing by mistake. ------------- Marked as reviewed by lkorinth (Reviewer). PR: https://git.openjdk.org/jdk/pull/11084 From redestad at openjdk.org Wed Nov 16 10:44:58 2022 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 16 Nov 2022 10:44:58 GMT Subject: RFR: 8296429: Remove os::supports_sse In-Reply-To: <I-IJQQkOSt4gv6LJXkpgAijXnE_HLk9j-Zms27EtbJk=.fe87caa9-7079-4666-a47e-d653087f8a44@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> <I-IJQQkOSt4gv6LJXkpgAijXnE_HLk9j-Zms27EtbJk=.fe87caa9-7079-4666-a47e-d653087f8a44@github.com> Message-ID: <OdjXYfbQkiNiWv-glaSPEiKHZccRnIQUjLiPQSntc-c=.b4308c06-8fe3-41fe-a4ea-b5473415f402@github.com> On Tue, 15 Nov 2022 13:06:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. > > This was only relevant on 32-bit anyway right? > > Looks good. > > 2.4. was released in 2001, I think its safe to remove. @tstuefe do you agree if I call this trivial and integrate? ------------- PR: https://git.openjdk.org/jdk/pull/11164 From eosterlund at openjdk.org Wed Nov 16 10:45:02 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 16 Nov 2022 10:45:02 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v3] In-Reply-To: <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> Message-ID: <wgYPBpC6kpnLl_hQNxdYUYpLG9WfWSipT71sHvhwXQM=.26547de3-612f-4a18-927f-0718e2fa4b0e@github.com> On Tue, 15 Nov 2022 10:11:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). >> >> For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. >> >> When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. >> >> This is what the constructors look like before the patch: >> >> CHeapBitMap() : CHeapBitMap(mtInternal) {} >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); >> >> >> And I'd like to change it to: >> >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); >> >> >> In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. >> >> We could probably condense this even further into just one constructor: >> >> explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} >> >> >> given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Align parameter order with GrowableArray changes Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11084 From stefank at openjdk.org Wed Nov 16 10:56:43 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 10:56:43 GMT Subject: Integrated: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray In-Reply-To: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> Message-ID: <tIIxjfKjrG9XsVcSvkCVuD12FGcbzRwmEr2_F-3yiJE=.c9e456af-e846-48d9-81f3-af8ce35741af@github.com> On Thu, 10 Nov 2022 09:40:44 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Today we use mtNone to denote that a GrowableArray should *NOT* be backed by a CHeap allocated array. > > I've gotten feedback that it would probably be good to limit the usage of mtNone, and at some point maybe completely remove it. > > This patch takes a small step to remove mtNone from the GrowableArray. What's left is only asserts to forbid that value. Those asserts will be trivial to remove when/if mtNone is removed. > > Just like in the proposed patch to make MEMFLAGS non-optional in CHeapBitMap (see JDK-[JDK-8296774](https://bugs.openjdk.org/browse/JDK-8296774)), I have thrown around the parameter order for GrowableArray. When looking at the changes to the usages of CHeap-backed GrowableArrays it becomes apparent that all of these usages where forced to provide a value for the initial capacity. When MEMFLAGS move to the front, we can now skip having to figure an initial capacity. This pull request has now been integrated. Changeset: 5f51dff6 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5f51dff6971d0f7ec7fd8e829a856fc4a45a7f3c Stats: 48 lines in 1 file changed: 30 ins; 5 del; 13 mod 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray Reviewed-by: sspitsyn, xliu, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/11086 From stefank at openjdk.org Wed Nov 16 11:02:41 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 11:02:41 GMT Subject: RFR: 8297020: Rename GrowableArray::on_stack [v2] In-Reply-To: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> Message-ID: <N3N_ggiPb81qBezCOyzwKytkOYFpFn3uMvz1Ja6QTgg=.24d6b2f9-95da-415c-8736-99852eb508c0@github.com> > GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Rename GrowableArray::on_stack - Spelling - Review tstuefe - Move now unnecessary 'explicit' specifier - Try to fix 32-bit builds - Merge remote-tracking branch 'upstream/master' into 8296776_growablearray_mtnone_cleanout_review - Review dholmes - Mark constructors explicit - 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11161/files - new: https://git.openjdk.org/jdk/pull/11161/files/c16e0336..c16e0336 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11161&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11161&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11161.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11161/head:pull/11161 PR: https://git.openjdk.org/jdk/pull/11161 From stefank at openjdk.org Wed Nov 16 11:05:59 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 11:05:59 GMT Subject: RFR: 8296926: Sort include lines of files in the include/ directory [v4] In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Remove include/ from test/hotspot files - Merge remote-tracking branch 'upstream/master' into 8296926_proper_include_lines_for_include_dir_files - Revert make file changes - Remove include/ from includes - 8296926: Use proper include lines for files in include/ ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11133/files - new: https://git.openjdk.org/jdk/pull/11133/files/92cba2ea..e9b7a5c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11133&range=02-03 Stats: 4236 lines in 147 files changed: 2626 ins; 894 del; 716 mod Patch: https://git.openjdk.org/jdk/pull/11133.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11133/head:pull/11133 PR: https://git.openjdk.org/jdk/pull/11133 From stefank at openjdk.org Wed Nov 16 11:06:03 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 11:06:03 GMT Subject: Integrated: 8297020: Rename GrowableArray::on_stack In-Reply-To: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> References: <cGKWN8in4tXnTutbbEmbFZGx-dof1XlYi6J1X3bCYcA=.b7f16e95-ba32-46ac-8b22-de8c51ca151d@github.com> Message-ID: <fWBIINPveVEcEXVD6gLv5viDbDsTFID1XkxYqEu4AvU=.0117a2ba-e160-4eea-95b8-c527b2e329f9@github.com> On Tue, 15 Nov 2022 11:09:56 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > GrowableArray::on_stack is confusing. It returns true if the backing elements are allocated in the stack's resource area. We typically call this resource allocations, not stack allocations. I propose that we rename it. This pull request has now been integrated. Changeset: 196d0210 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/196d0210df740fe26ca674973519a30b634a6b3a Stats: 22 lines in 3 files changed: 0 ins; 0 del; 22 mod 8297020: Rename GrowableArray::on_stack Reviewed-by: stuefe, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/11161 From lkorinth at openjdk.org Wed Nov 16 11:06:00 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 16 Nov 2022 11:06:00 GMT Subject: RFR: 8296926: Sort include lines of files in the include/ directory [v4] In-Reply-To: <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> Message-ID: <kXbjbISue3PCTpnD1eAs31EE4M8JGm7DhMRZkY5IOvA=.38a460fa-ac33-473f-89f9-e8dab2382b64@github.com> On Wed, 16 Nov 2022 11:02:19 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpo t. >> >> This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Remove include/ from test/hotspot files > - Merge remote-tracking branch 'upstream/master' into 8296926_proper_include_lines_for_include_dir_files > - Revert make file changes > - Remove include/ from includes > - 8296926: Use proper include lines for files in include/ Although I prefer the original solution with path, I think the current version is also a good improvement. Approved. Thanks for fixing this! ------------- Marked as reviewed by lkorinth (Reviewer). PR: https://git.openjdk.org/jdk/pull/11133 From luhenry at openjdk.org Wed Nov 16 11:06:47 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 16 Nov 2022 11:06:47 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <ichA8JH8FSz25l7ONINVLvj1LM77AP9HI8WSi-Ub3q8=.3647cc04-b1e6-4c60-a8e5-9c6ba9d52449@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported I mean something like https://github.com/openjdk/jdk/commit/d75b565dabc1dab3c508d2b4b83d34af5a1c7a35 (it hasn't been built or tested). ------------- PR: https://git.openjdk.org/jdk/pull/11155 From aph at openjdk.org Wed Nov 16 11:06:46 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 11:06:46 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v7] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <y8PbnEuueI86HzraVE8471Mk_n6fqi5oAjJ7p-Oa8nc=.ae1c3ebd-8721-40a7-b31e-87746e267e4e@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/e1063d7b..1bd9d47a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 16 11:06:47 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 11:06:47 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <5hAVImQZUFPiE99N8uZ5KC7YUGHEkYglmftttBxZln4=.d469624e-5ddc-4c13-bc64-ed0c73ceb5a2@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <5hAVImQZUFPiE99N8uZ5KC7YUGHEkYglmftttBxZln4=.d469624e-5ddc-4c13-bc64-ed0c73ceb5a2@github.com> Message-ID: <KZP5SV2QP7DmleReNRTHuXGWnXOI8ui-keo8bE5HevA=.76c59d85-6f9d-42ee-9f87-a6f21bb6fee8@github.com> On Tue, 15 Nov 2022 20:11:36 GMT, Alan Bateman <alanb at openjdk.org> wrote: > I'd prefer to keep this as is because it's too much to use "threads" or mention inheritance in the first paragraph. I agree. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 16 11:06:51 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 11:06:51 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> Message-ID: <gvyVzC0NPYwWrx_xHSNugz9DDryV-XAbMwFuzQgpIJE=.b1d10622-d1c9-4e21-8d05-4ada5cf605ec@github.com> On Tue, 15 Nov 2022 18:35:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix failing serviceability tests > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 172: > >> 170: * must be an integer power of 2. >> 171: * >> 172: * <p>For example, you could use {@code -Djdk.incubator.concurrent.ScopedValue.cacheSize=8}. > > I would also avoid "you" and "we" in the javadoc. While javadoc is not formal, we often use locutions such as "clients can use/do XYZ". OK, it's a style thing. Wilco. > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 185: > >> 183: * would have to be regenerated after a blocking operation. >> 184: * >> 185: * @param <T> the type of the value > > Suggestion: > > * @param <T> the type of the scoped value Mm, but this is the type of the value of the `ScopedValue` instance. So, the type of the scoped value is `ScopedValue<T>`, the type of the value is `T`, is it not? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 16 11:06:51 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 11:06:51 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> Message-ID: <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> On Tue, 15 Nov 2022 21:28:29 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This comment is on orElse but I suspect you are suggesting that get() be changed to return Optional. I think we'll need to get more feedback/usage of this API before re-visiting that. > > Yes, my comment was really on `get` - that said, I note that saying get().get() would look odd (but maybe finding some other name for `ScopedValue::get`, such as `find` might work) It certainly would look odd. This API is, by design, as lightweight as it possibly can be, both from an implementation and a user's point of view. It's also intended to be as close as possible to an "invisible" parameter passed to all callees. From that point of view, `get()` is a wart. `get().get()` is just... ------------- PR: https://git.openjdk.org/jdk/pull/10952 From stuefe at openjdk.org Wed Nov 16 11:11:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 11:11:07 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v7] In-Reply-To: <lTTxdskiTr0w5EmZ0xKWQlSBCfhePMpHV18cpzWh_pE=.2da73c6b-cd06-4a9b-89ba-213fc10cb8f5@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <lTTxdskiTr0w5EmZ0xKWQlSBCfhePMpHV18cpzWh_pE=.2da73c6b-cd06-4a9b-89ba-213fc10cb8f5@github.com> Message-ID: <3Z4ZjjimT7rK922Mho3uBBr59f8InHziFo3_Q10_Eyo=.d056cd57-4f64-48f4-862e-e210c0d317b1@github.com> On Wed, 16 Nov 2022 07:03:12 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > address review comments I don't think it is safe to use the return value of `os::snprintf` to update buffer positions. `os::snprintf()` calls `os::vsnprintf()` which, at least on Posix, returns the return value of `vnsprintf(3)` verbatim. If native `vsnprintf(3)` conforms to SUSv2 [1], it will return <0 if the buffer size is zero. So for cases where we calculate the buffer size based on what was written, and we are just at the edge of the buffer, we would move the next write position backward. But much worse: if the native `vsnprintf(3)` conforms to C99 (e.g. glibc, BSD libc) they return "number of characters (not including the terminating null character) which would have been written to buffer if bufsz was ignored". [2] So, if the buffer was too small and we truncated, and we use the return value to calculate the next write position, we will write outside the buffer boundaries. Regardless of what behavior we have - C99 or SUSv2 - we cannot just use the return value to update the next write position without first checking the return value. We could - and probably should - decide on C99 or SUSv2 behavior for all platforms, and modify `os::snprintf()` to provide that in an OS-independent way. But unless we decide on some mongrel of both C99 and SUSv2 behaviors, the problem remains. Cheers, Thomas [1] https://pubs.opengroup.org/onlinepubs/7908799/xsh/fprintf.html [2] https://en.cppreference.com/w/c/io/fprintf ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Wed Nov 16 11:20:57 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 11:20:57 GMT Subject: RFR: 8296429: Remove os::supports_sse In-Reply-To: <I-IJQQkOSt4gv6LJXkpgAijXnE_HLk9j-Zms27EtbJk=.fe87caa9-7079-4666-a47e-d653087f8a44@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> <I-IJQQkOSt4gv6LJXkpgAijXnE_HLk9j-Zms27EtbJk=.fe87caa9-7079-4666-a47e-d653087f8a44@github.com> Message-ID: <DGVPMxYCtJMrUBaFxZY3F4Hxg9bAG_-sRBcrOT8my80=.34e08747-487e-478b-97dc-2268b7eff352@github.com> On Tue, 15 Nov 2022 13:06:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. > > This was only relevant on 32-bit anyway right? > > Looks good. > > 2.4. was released in 2001, I think its safe to remove. > @tstuefe do you agree if I call this trivial and integrate? Looks trivial to me. Should anyone have problems with the VM on pre-2004 unpatched 32-bit kernels, this code can be very easily revived. ------------- PR: https://git.openjdk.org/jdk/pull/11164 From stuefe at openjdk.org Wed Nov 16 11:37:48 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 11:37:48 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> Message-ID: <YVUjKRi6jttqnn71S7zF-WXzxwzkNozXsZDt1_9yMTE=.908c540d-8c72-4ab7-be30-ffcbf6932290@github.com> On Mon, 14 Nov 2022 07:29:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors > - Feedback David Hi @xmas92, would you like to review this one? This is an attempt to help with analyzing secondary crashes, in order to better understand what problems we face when e.g. printing registers. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From mcimadamore at openjdk.org Wed Nov 16 11:45:41 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 11:45:41 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <gvyVzC0NPYwWrx_xHSNugz9DDryV-XAbMwFuzQgpIJE=.b1d10622-d1c9-4e21-8d05-4ada5cf605ec@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <gvyVzC0NPYwWrx_xHSNugz9DDryV-XAbMwFuzQgpIJE=.b1d10622-d1c9-4e21-8d05-4ada5cf605ec@github.com> Message-ID: <rbjUHcMTBVqqmi0KImuzs0jVOp94PBFVxvsRsv1_0Iw=.89564d70-23c9-4c6a-9cfd-38f2af702b88@github.com> On Wed, 16 Nov 2022 10:57:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 185: >> >>> 183: * would have to be regenerated after a blocking operation. >>> 184: * >>> 185: * @param <T> the type of the value >> >> Suggestion: >> >> * @param <T> the type of the scoped value > > Mm, but this is the type of the value of the `ScopedValue` instance. > So, the type of the scoped value is `ScopedValue<T>`, the type of the value is `T`, is it not? Right - there's "scoped value" which is the holder, and "value of the scoped value" which is what has the type T. You are correct that just dropping "scoped" in there doesn't make things better. Conversely, just leaving "value" and nothing else (as per PR) is ambiguous. It feels like we need some way (uniform throughout the javadoc) to speak about "the value associated to a scoped value instance". ------------- PR: https://git.openjdk.org/jdk/pull/10952 From mcimadamore at openjdk.org Wed Nov 16 11:45:43 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 11:45:43 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> Message-ID: <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> On Wed, 16 Nov 2022 11:03:07 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Yes, my comment was really on `get` - that said, I note that saying get().get() would look odd (but maybe finding some other name for `ScopedValue::get`, such as `find` might work) > > It certainly would look odd. This API is, by design, as lightweight as it possibly can be, both from an implementation and a user's point of view. It's also intended to be as close as possible to an "invisible" parameter passed to all callees. From that point of view, `get()` is a wart. `get().get()` is just... IMHO there are ways to have the cake and eat it too. That is, we could have a couple of overloads: T get() { ... } // throws NSME if not found Optional<T> find() // returns empty optional if not found Then, for simple use cases, code will stay the same as today. But, if users want to deal with optionality explicitly, they can call `find` and then call `orElse`, `map` or whatever they like. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Wed Nov 16 12:04:00 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 12:04:00 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> Message-ID: <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> On Wed, 16 Nov 2022 11:41:49 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> It certainly would look odd. This API is, by design, as lightweight as it possibly can be, both from an implementation and a user's point of view. It's also intended to be as close as possible to an "invisible" parameter passed to all callees. From that point of view, `get()` is a wart. `get().get()` is just... > > IMHO there are ways to have the cake and eat it too. That is, we could have a couple of overloads: > > > T get() { ... } // throws NSME if not found > Optional<T> find() // returns empty optional if not found > > > Then, for simple use cases, code will stay the same as today. But, if users want to deal with optionality explicitly, they can call `find` and then call `orElse`, `map` or whatever they like. We expect isBound() will be used a lot and I think that is clearer (and cheaper) than find().isEmpty(). Time will tell on orElse/orElseThrow and whether they should be replaced with an Optional view. That is, I think your comments mostly apply to those two methods rather than get/isBound. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From fyang at openjdk.org Wed Nov 16 12:05:25 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 16 Nov 2022 12:05:25 GMT Subject: RFR: 8296916: RISC-V: Move some small macro-assembler functions to header file [v2] In-Reply-To: <vFrz5OWLa1nQv_kMedGi3abjJ1U3AEn4UiieFcOicdU=.f49a2a23-6326-473d-8363-6b7f2d196724@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> <4HhSvQNYeFOwFEKahHQuORkDDT7q8_Ihyb8jlGzo5aY=.cefa5cc4-f3d6-4191-bd9e-b35582752cf4@github.com> <vFrz5OWLa1nQv_kMedGi3abjJ1U3AEn4UiieFcOicdU=.f49a2a23-6326-473d-8363-6b7f2d196724@github.com> Message-ID: <r2vsVTAXGckQ4Fmus6b8JWTE8IM0egk8bM_uHAi49nc=.2a6090a6-c4ee-4462-b4f9-e60dafcc3df8@github.com> On Wed, 16 Nov 2022 07:37:38 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > But this is okay as well. Thanks. I will take another look to see if we can further improve this after this one is merged. ------------- PR: https://git.openjdk.org/jdk/pull/11130 From fyang at openjdk.org Wed Nov 16 12:05:27 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 16 Nov 2022 12:05:27 GMT Subject: Integrated: 8296916: RISC-V: Move some small macro-assembler functions to header file In-Reply-To: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> References: <7xQWoiVrawRRQv6YD75Yos_SmmvnWzFasAGRYkapb8M=.25505875-9195-481c-8408-ff92edb76a6b@github.com> Message-ID: <A2VCBN8rnrJLYytM6OoXph_m0JOaRl9kIV1F831r__s=.e4eff1b0-8a3c-4784-a342-1fad77f3dd8f@github.com> On Mon, 14 Nov 2022 02:19:30 GMT, Fei Yang <fyang at openjdk.org> wrote: > Witnessed that there are some small macro-assembler functions located in file macroAssembler_riscv.cpp. > These are small functions which mostly contain only a single line of code. We should move them to the > corresponding header file so that they have a chance to be inlined. > > Testing: Tier1 on linux-riscv64 HiFive unmatched board. This pull request has now been integrated. Changeset: c3b285a8 Author: Fei Yang <fyang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c3b285a8acaf4a6771e80b0a19bf21d6873f1a38 Stats: 292 lines in 3 files changed: 106 ins; 146 del; 40 mod 8296916: RISC-V: Move some small macro-assembler functions to header file Reviewed-by: fjiang, yadongwang, shade ------------- PR: https://git.openjdk.org/jdk/pull/11130 From xlinzheng at openjdk.org Wed Nov 16 12:09:48 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 16 Nov 2022 12:09:48 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <X_7wb2ueSEVcn9fgx95jhuhI3rLElAZDqDVUbQDspws=.3f8ba680-cb16-4230-b36f-e88f718c278c@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported > I mean something like [d75b565](https://github.com/openjdk/jdk/commit/d75b565dabc1dab3c508d2b4b83d34af5a1c7a35) (it hasn't been built or tested). Thanks for the explanation :-) I see, and your solution looks far better than mine. But making `UseRVA20U64` as default true is an opinion from @VladimirKempik discussed in the mailing list, so I would also like to ask if he is okay with this. (I guess the `vm_exit_during_initialization` would make `UseRVA20U64` hard to become a default option though, for Java should start normally anywhere.) ------------- PR: https://git.openjdk.org/jdk/pull/11155 From mcimadamore at openjdk.org Wed Nov 16 12:14:14 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 12:14:14 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> Message-ID: <A9xOoYTgicpAXaATLn1jLW1X0pP7ZlDRh0ISp9E1H3M=.7ee59a3e-5e66-43a0-89b2-fcfa4674b449@github.com> On Wed, 16 Nov 2022 12:01:58 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> IMHO there are ways to have the cake and eat it too. That is, we could have a couple of overloads: >> >> >> T get() { ... } // throws NSME if not found >> Optional<T> find() // returns empty optional if not found >> >> >> Then, for simple use cases, code will stay the same as today. But, if users want to deal with optionality explicitly, they can call `find` and then call `orElse`, `map` or whatever they like. > > We expect isBound() will be used a lot and I think that is clearer (and cheaper) than find().isEmpty(). > > Time will tell on orElse/orElseThrow and whether they should be replaced with an Optional view. That is, I think your comments mostly apply to those two methods rather than get/isBound. Note that `isBound` can also be explained as just a shorcut for `find().isEmpty()`. That is, I'm not really suggesting to drop sugary methods from the API. But by exposing the optional nature of the result, the API might end up being more composable. But, as you said, we don't have to decide now. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From stefank at openjdk.org Wed Nov 16 12:26:01 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 12:26:01 GMT Subject: RFR: 8296774: Removed default MEMFLAGS value from CHeapBitMap [v3] In-Reply-To: <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> <ZTeey3fKU3nGct4NWNzjPeWMXEnytpMmoVFG-f2GW7Y=.6caf0cbd-8c4d-49ae-aeeb-a0681e759c75@github.com> Message-ID: <_Ewao01oZRZjNUwyx2DcKAGvO1i7jvdPztIoIiftxtk=.9ff01301-75b4-4c85-93c6-160ba6d0357e@github.com> On Tue, 15 Nov 2022 10:11:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). >> >> For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. >> >> When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. >> >> This is what the constructors look like before the patch: >> >> CHeapBitMap() : CHeapBitMap(mtInternal) {} >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); >> >> >> And I'd like to change it to: >> >> explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} >> CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); >> >> >> In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. >> >> We could probably condense this even further into just one constructor: >> >> explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} >> >> >> given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Align parameter order with GrowableArray changes Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11084 From stuefe at openjdk.org Wed Nov 16 12:29:17 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 12:29:17 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> Message-ID: <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Changes requested by stuefe (Reviewer). src/hotspot/share/utilities/elfFile.cpp line 1605: > 1603: uint32_t current_index = 1; // file_names start at index 1 > 1604: const size_t dwarf_filename_len = 1024; > 1605: char dwarf_filename[dwarf_filename_len]; // Store the filename read from DWARF which is then copied to 'filename'. Putting such a large array on the stack is a bit borderline. Especially in error reporting, where you may build up stack repeatedly via secondary crash handling. I realize though that no good alternatives exist. C-heap may be corrupted, ResourceArea is also out of the question. Could we get away with using filename directly? src/hotspot/share/utilities/elfFile.cpp line 1636: > 1634: char* last_slash = strrchr(filename, *os::file_separator()); > 1635: if (last_slash != nullptr) { > 1636: uint16_t index_after_slash = (uint16_t)(last_slash + 1 - filename); Why uint16_t? We have `pointer_delta()` for that btw if you want to be super correct. See globalDefinitions.hpp src/hotspot/share/utilities/elfFile.cpp line 1638: > 1636: uint16_t index_after_slash = (uint16_t)(last_slash + 1 - filename); > 1637: // Copy filename to beginning of buffer. > 1638: int bytes_written = jio_snprintf(filename, filename_len - index_after_slash, "%s", filename + index_after_slash); I don't think this is guaranteed to work since the memory areas you move may interleave. You should copy char-wise, or use `memmove(3)`. src/hotspot/share/utilities/elfFile.cpp line 1651: > 1649: int bytes_written = jio_snprintf(dst, count, "%s", src); > 1650: // Add null terminator. > 1651: dst[count - 1] = '\0'; Does it make sense to return a truncated file name up to the caller of `DwarfFile::LineNumberProgram::get_filename_from_header()`? Will this not just confuse him? I think it makes more sense to cleanly handle truncation, and e.g. skip file parsing for dwarf files with too long names. src/hotspot/share/utilities/elfFile.hpp line 865: > 863: bool get_filename_from_header(uint32_t file_index, char* filename, size_t filename_len); > 864: static void strip_path_prefix(char* filename, const size_t filename_len); > 865: static void copy_dwarf_filename_to_filename(char* src, size_t src_len, char* dst, size_t dst_len); Stupid question, do these have to be exposed? Or could they be just static helpers in elfFile.cpp? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From stefank at openjdk.org Wed Nov 16 12:29:54 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 12:29:54 GMT Subject: Integrated: 8296774: Removed default MEMFLAGS value from CHeapBitMap In-Reply-To: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> References: <zvNQtugAdQlxKWzCbzM9pHtTFEA0DrV7LNLDjTy5bpU=.70d5dff5-e96f-4fb8-abf9-5a08c1f5b22f@github.com> Message-ID: <inv9i9xftlCkf5af3T7is8pl3_mJ7ON6ugQP4mnj31o=.1bbcce2b-0e14-47e3-925d-d2253b8e9e01@github.com> On Thu, 10 Nov 2022 09:26:34 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Today it is easy to accidentally create CHeapBitMaps that uses the default mtInternal MEMFLAGS instead of a value that is suitable for the subsystem. I fixed the instances I could find with #10948 / [JDK-8296231](https://bugs.openjdk.org/browse/JDK-8296231). > > For that PR I didn't want to change the constructors of the bitmap because #10941 / [JDK-8296139](https://bugs.openjdk.org/browse/JDK-8296139) was being out for review. Now when that change has been pushed I'd like to change the constructors of the CHeapBitMap, so that we don't accidentally make these mistakes. > > When making it mandatory to pass MEMFLAGS, it becomes apparent that the current parameter order is a bit odd. If you look closely you see that all three parameters are optional. When I now want to make MEMFLAGS mandatory, I'd like to move it so that it always is the first parameter. This will simplify the constructors a bit, IMHO. > > This is what the constructors look like before the patch: > > CHeapBitMap() : CHeapBitMap(mtInternal) {} > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(idx_t size_in_bits, MEMFLAGS flags = mtInternal, bool clear = true); > > > And I'd like to change it to: > > explicit CHeapBitMap(MEMFLAGS flags) : GrowableBitMap(0, false), _flags(flags) {} > CHeapBitMap(MEMFLAGS flags, idx_t size_in_bits, bool clear = true); > > > In effect, this makes `flags` mandatory and `size_in_bits` and `clear` optional. > > We could probably condense this even further into just one constructor: > > explicit CHeapBitMap(MEMFLAGS flags, size_t size_in_bits = 0, bool clear = true) : GrowableBitMap(size_in_bits, clear), _flags(flags) {} > > > given that the value of `clear` doesn't matter when `size_in_bits` is 0. I didn't do that, but could be swayed to do that. This pull request has now been integrated. Changeset: 8cdcec44 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8cdcec44d81504978dfdfa8e2277907e4b9688ee Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod 8296774: Removed default MEMFLAGS value from CHeapBitMap Reviewed-by: lkorinth, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/11084 From stefank at openjdk.org Wed Nov 16 12:35:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 12:35:30 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v2] In-Reply-To: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> Message-ID: <VCm7nSBm1Dgw9LSHBg1FBDiu10MxK-0Po-tLt2mk-ac=.523c8f78-2f4c-4786-a5e1-d90f633dfebd@github.com> > Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. > > We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: > https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - 8296785: Use realloc for CHeap-allocated BitMaps - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction - 8296774: Removed default MEMFLAGS value from CHeapBitMap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11102/files - new: https://git.openjdk.org/jdk/pull/11102/files/f13888b8..f13888b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11102&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11102&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11102.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11102/head:pull/11102 PR: https://git.openjdk.org/jdk/pull/11102 From vkempik at openjdk.org Wed Nov 16 12:50:06 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 16 Nov 2022 12:50:06 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <m37euNqGEp7TT0jUAT97NoU6WtRfcJ7vWEP_Raa84-g=.1a5bea2d-8f21-4b07-96f4-3e798f01d63a@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported Well, did you see any rv64g hardware without Compressed Opcodes support ? Of course some may configure custom cpu core without RVC support, but it's usually done for custom environments where even linux is not present. as was mentioned in ML, the rva20u64 profile also requires unaligned memory access support, which is still implicit requirement of jdk on risc-v ( see https://bugs.openjdk.org/browse/JDK-8291550 ), some cpus support it in hardware, some support m-mode emulator only. Basically we are safe to enable rva20u64 by default I think. Nothing will change or break, and we can catch some RVC related bugs. if anything we can change that logic ------------- PR: https://git.openjdk.org/jdk/pull/11155 From duke at openjdk.org Wed Nov 16 12:52:10 2022 From: duke at openjdk.org (ExE Boss) Date: Wed, 16 Nov 2022 12:52:10 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <A9xOoYTgicpAXaATLn1jLW1X0pP7ZlDRh0ISp9E1H3M=.7ee59a3e-5e66-43a0-89b2-fcfa4674b449@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> <A9xOoYTgicpAXaATLn1jLW1X0pP7ZlDRh0ISp9E1H3M=.7ee59a3e-5e66-43a0-89b2-fcfa4674b449@github.com> Message-ID: <okStohonRqnvGd9Sx3M4vIfuwrDfZW-9bHUQn-BrtKk=.8c8f166d-025a-43b9-88d7-6e7882ac1394@github.com> On Wed, 16 Nov 2022 12:11:45 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> We expect isBound() will be used a lot and I think that is clearer (and cheaper) than find().isEmpty(). >> >> Time will tell on orElse/orElseThrow and whether they should be replaced with an Optional view. That is, I think your comments mostly apply to those two methods rather than get/isBound. > > Note that `isBound` can also be explained as just a shorcut for `find().isEmpty()`. That is, I'm not really suggesting to drop sugary methods from the API. But by exposing the optional nature of the result, the API might end up being more composable. > > But, as you said, we don't have to decide now. Note?that `ScopedValue` can?currently be?bound to?`null`, but?by?using `Optional`, there?would be?no?way to?differentiate an?unbound `ScopedValue` from?one bound?to?`null`. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From stefank at openjdk.org Wed Nov 16 12:52:55 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 12:52:55 GMT Subject: RFR: 8296926: Sort include lines of files in the include/ directory [v4] In-Reply-To: <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> Message-ID: <0QivfI9yo0RhJYISYtMy3GXN9KY_YlX0GKyLaMalT3E=.d92f4faf-0c02-416d-9840-73d6826d5ad4@github.com> On Wed, 16 Nov 2022 11:05:59 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpo t. >> >> This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Remove include/ from test/hotspot files > - Merge remote-tracking branch 'upstream/master' into 8296926_proper_include_lines_for_include_dir_files > - Revert make file changes > - Remove include/ from includes > - 8296926: Use proper include lines for files in include/ Thanks for the reviews and discussions. ------------- PR: https://git.openjdk.org/jdk/pull/11133 From vkempik at openjdk.org Wed Nov 16 12:54:01 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 16 Nov 2022 12:54:01 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <5jNzRnIvwHH7OXzfSru1G-pnxg-FwoHDe1kHIwZrP7A=.761942ff-b752-4427-9a95-083cb8e1d941@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported Marked as reviewed by vkempik (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/11155 From stefank at openjdk.org Wed Nov 16 12:54:40 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 12:54:40 GMT Subject: Integrated: 8296926: Sort include lines of files in the include/ directory In-Reply-To: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> Message-ID: <WaXoihwHebbvEEytUbV4-E6YWt5vZ0XOQZKBZl-B49Y=.d7181ee2-8a1d-4498-a081-9fe35a57459b@github.com> On Mon, 14 Nov 2022 09:25:11 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot . > > This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. This pull request has now been integrated. Changeset: 813b223a Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/813b223a6bcd9f6290ee9c8840a8c69061ade48c Stats: 231 lines in 113 files changed: 111 ins; 117 del; 3 mod 8296926: Sort include lines of files in the include/ directory Reviewed-by: kbarrett, erikj, lkorinth ------------- PR: https://git.openjdk.org/jdk/pull/11133 From coleenp at openjdk.org Wed Nov 16 12:59:11 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Nov 2022 12:59:11 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <HeTLbgd9vgQNCDQ4k7CVyv4xd8dJF8H9g0APM3GOXr4=.629511e6-bb14-4bd7-b8bd-f0f34dd044ea@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> <HeTLbgd9vgQNCDQ4k7CVyv4xd8dJF8H9g0APM3GOXr4=.629511e6-bb14-4bd7-b8bd-f0f34dd044ea@github.com> Message-ID: <QC70yl4pepxz5CEE39lPbq82dKGASqMP05AsuZrKDN8=.438439f8-eefc-4760-ae06-f748b10d3b9e@github.com> On Wed, 16 Nov 2022 03:54:31 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 564: > >> 562: >> 563: for (int i = 0; i < length; i++) { >> 564: objArray[i] = (jthreadGroup)JNIHandles::make_local(groups->obj_at(i)); > > Nit: It is better to use `jni_reference` instead of `JNIHandles::make_local` for consistency as at the line 549. jni_reference takes a Handle and this passes a Handle to an objArray so groups->obj_at(i) is an oop and it's a waste to make it a handle for jni_reference to just unhandlize it. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From stuefe at openjdk.org Wed Nov 16 12:59:22 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 12:59:22 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v7] In-Reply-To: <lTTxdskiTr0w5EmZ0xKWQlSBCfhePMpHV18cpzWh_pE=.2da73c6b-cd06-4a9b-89ba-213fc10cb8f5@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <lTTxdskiTr0w5EmZ0xKWQlSBCfhePMpHV18cpzWh_pE=.2da73c6b-cd06-4a9b-89ba-213fc10cb8f5@github.com> Message-ID: <EIOQyCxaCdzqtip4DDg1ob7mj-bA7BniVgTJbKt-acg=.e8256ee9-7029-4073-8b4d-b97914513886@github.com> On Wed, 16 Nov 2022 07:03:12 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > address review comments src/hotspot/os/bsd/attachListener_bsd.cpp line 295: > 293: char msg[32]; > 294: int msg_len = os::snprintf(msg, sizeof(msg), "%d\n", ATTACH_ERROR_BADVERSION); > 295: write_fully(s, msg, msg_len); Assuming C99 behavior: safe but only because the buffer is large enough ("%d\n" needs at most 12 bytes, buffer is 32). Were it to overflow, msg_len would be larger than sizeof(msg) and we would probably end up reading beyond the message end in write_fully. So not really better than using sprintf+strlen. src/hotspot/os/bsd/attachListener_bsd.cpp line 415: > 413: char msg[32]; > 414: int msg_len = os::snprintf(msg, sizeof(msg), "%d\n", result); > 415: int rc = BsdAttachListener::write_fully(this->socket(), msg, msg_len); same src/hotspot/share/adlc/output_c.cpp line 217: > 215: const PipeClassOperandForm *tmppipeopnd = > 216: (const PipeClassOperandForm *)pipeclass->_localUsage[paramname]; > 217: templen += snprintf(&operand_stages[templen], operand_stages_size - templen, " stage_%s%c\n", C99 Behavior: all these are probably safe but only because we never overstepped the buffer in the first place, the buffer size is pre-calculated. If it is incorrect and we have a truncation, subsequent writes will write beyond the buffer. src/hotspot/share/classfile/javaClasses.cpp line 2527: > 2525: > 2526: // Print stack trace line in buffer > 2527: size_t buf_off = os::snprintf(buf, buf_size, "\tat %s.%s(", klass_name, method_name); Here, and in subsequent uses: assuming C99 behavior of snprintf, if we truncated in snprintf, buf_off will be > buffer size, (buf + buf_off) point beyond the buffer, (buf_size - buf_off) will overflow and become very large. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From redestad at openjdk.org Wed Nov 16 13:04:56 2022 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 16 Nov 2022 13:04:56 GMT Subject: RFR: 8296429: Remove os::supports_sse In-Reply-To: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> Message-ID: <8iwFB3ZKoTilit5tGdKJ-ghI0MjFMDgjfnPeahxJZbc=.5498cb81-0ba4-49db-b3c9-6ed464fe5274@github.com> On Tue, 15 Nov 2022 12:42:48 GMT, Claes Redestad <redestad at openjdk.org> wrote: > os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. Explicitly disabling SSE with `-XX:UseSSE=0` is another workaround on such old, unpatched kernels. ------------- PR: https://git.openjdk.org/jdk/pull/11164 From redestad at openjdk.org Wed Nov 16 13:06:36 2022 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 16 Nov 2022 13:06:36 GMT Subject: Integrated: 8296429: Remove os::supports_sse In-Reply-To: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> References: <yr_DVggNIzZZj-Q4u8TAHrrffc3DYVEarGAySiwu4_s=.3f0d36c9-ca59-4243-bc1d-19f9d41d3ab9@github.com> Message-ID: <pmmBkBH6dyoSd9f3AYW2ALWID5BsvBQaSQH1F-l6udA=.bbc724bc-5090-4e3b-a622-8ee4df6d6853@github.com> On Tue, 15 Nov 2022 12:42:48 GMT, Claes Redestad <redestad at openjdk.org> wrote: > os::supports_sse only exists to be backwards compatible with linux kernels older than 2.4, which may not have SSE support. Since support for 2.2.x kernels ended in 2004 I think we can safely clean this out. This pull request has now been integrated. Changeset: e72b0ac4 Author: Claes Redestad <redestad at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e72b0ac4affd0bc2151190c4efe207f12a7ebf6a Stats: 37 lines in 6 files changed: 0 ins; 37 del; 0 mod 8296429: Remove os::supports_sse Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/jdk/pull/11164 From coleenp at openjdk.org Wed Nov 16 13:12:58 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Nov 2022 13:12:58 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> Message-ID: <Tpwt6XDSr5uEXRjXZIDLS7NLQQdAsrlcywwmPntSbu0=.bcd161c3-4583-46f4-b9c2-c13efbbf8eed@github.com> On Tue, 15 Nov 2022 18:52:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. Thanks for reviewing Serguei. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From alanb at openjdk.org Wed Nov 16 13:16:09 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 13:16:09 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <okStohonRqnvGd9Sx3M4vIfuwrDfZW-9bHUQn-BrtKk=.8c8f166d-025a-43b9-88d7-6e7882ac1394@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> <A9xOoYTgicpAXaATLn1jLW1X0pP7ZlDRh0ISp9E1H3M=.7ee59a3e-5e66-43a0-89b2-fcfa4674b449@github.com> <okStohonRqnvGd9Sx3M4vIfuwrDfZW-9bHUQn-BrtKk=.8c8f166d-025a-43b9-88d7-6e7882ac1394@github.com> Message-ID: <VQfLGLzs4-yMjO1eZMm-rku-Ti63OETdzNtRnaberMs=.8e95e70e-114d-4920-bb31-d3e8f72c502d@github.com> On Wed, 16 Nov 2022 12:50:04 GMT, ExE Boss <duke at openjdk.org> wrote: > Note?that `ScopedValue` can?currently be?bound to?`null`, but?by?using `Optional`, there?would be?no?way to?differentiate an?unbound `ScopedValue` from?one bound?to?`null`. That's right, an Optional view would have to deal with that. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From eosterlund at openjdk.org Wed Nov 16 14:11:01 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 16 Nov 2022 14:11:01 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <VCyJE9wRumQ6-HNBPaN3nZVV0WHTwaP9euMyiepJfLA=.5258685c-fc2e-465f-aa3d-aea4bc0ecde8@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <K4UA3mF3XBvUjVUy9c15q0CxlTgu0hsH7vpI2FUHz1w=.972d2f82-5819-40f1-a443-dc6c4e94f44b@github.com> <VCyJE9wRumQ6-HNBPaN3nZVV0WHTwaP9euMyiepJfLA=.5258685c-fc2e-465f-aa3d-aea4bc0ecde8@github.com> Message-ID: <O0OnX5vDNHb6WqAtk5ptus6_gLmcRx4A4VXHTnWCKOs=.6bc24955-efe5-43aa-a07d-eabf6cd3a40b@github.com> On Tue, 15 Nov 2022 09:39:27 GMT, Fei Yang <fyang at openjdk.org> wrote: >> PS: I see JVM crashes when running Skynet with extra VM option: -XX:+VerifyContinuations on linux-aarch64 platform. >> >> $java --enable-preview -XX:+VerifyContinuations Skynet >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> >> # after -XX: or in .hotspotrc: SuppressErrorAt=# >> # Internal Error/stackChunkOop.cpp (/home/realfyang/openjdk-jdk/src/hotspot/share/oops/stackChunkOop.cpp:433), pid=1904185:433, tid=1904206 >> >> [thread 1904216 also had an error]# assert(_chunk->bitmap().at(index)) failed: Bit not set at index 208 corresponding to 0x0000000637c512d0 >> >> # >> # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.realfyang.openjdk-jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.realfyang.openjdk-jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > >> @RealFYang did you have a chance to see if my RISC-V changes worked out for you? > > Hi, I have performed tier1-3 tests on my linux-riscv64 HiFive Unmatched boards. Results looks good. > Thanks for handling riscv at the same time :-) > > PS: Also passed Skynet test with all GCs plus extra VM options: -XX:+VerifyStack -XX:+VerifyContinuations Thanks @RealFYang! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From aph at openjdk.org Wed Nov 16 14:13:29 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 14:13:29 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v8] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <16r2j-OSTNYGEK74xtcCK4RxnFxJ1fztEjSy5uYZ8EY=.9aab72c6-3aa9-4621-bcfc-b1298c83e69d@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/1bd9d47a..222ddcbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 16 14:18:01 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 14:18:01 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v9] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <mHmKVPWC_R7fBUpYFSeJXVrYqS29db3Lx6x4j6-gvW8=.503976ec-428b-4351-9428-9664c17a96e6@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/222ddcbc..2a2b0cca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Wed Nov 16 14:19:12 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 14:19:12 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> Message-ID: <c6M4qQ6Y6Hun5GLbIKb9U4P9DaRPP9EE6SAOZQESQDQ=.9530cf16-d7f9-466a-a8b6-0ef76c3d1418@github.com> On Tue, 15 Nov 2022 18:52:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11033 From jnimeh at openjdk.org Wed Nov 16 14:45:03 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Wed, 16 Nov 2022 14:45:03 GMT Subject: RFR: 8247645: ChaCha20 intrinsics In-Reply-To: <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <Beacw6Zz39Sfy-LwdkOi7Q9scH-i3fqtz_sVygbWdi0=.01208ee7-838d-4a6e-9de5-ad2561c5e337@github.com> Message-ID: <RgX5yTd-zwfUyUpMnYhlkV-9sl0-ruKVynqlRjThPH8=.556b380d-6271-44d2-8f20-e4eb6cf426a9@github.com> On Fri, 21 Oct 2022 12:29:22 GMT, Andrew Haley <aph at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Please my friend, let's get this finished or I'm going to have to do it myself. Hi @theRealAph, since I have the green light on the x86 side I was wondering if I could get your blessing on the aarch64 side of the house or if you have any other concerns. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From stuefe at openjdk.org Wed Nov 16 15:14:59 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 15:14:59 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v2] In-Reply-To: <VCm7nSBm1Dgw9LSHBg1FBDiu10MxK-0Po-tLt2mk-ac=.523c8f78-2f4c-4786-a5e1-d90f633dfebd@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> <VCm7nSBm1Dgw9LSHBg1FBDiu10MxK-0Po-tLt2mk-ac=.523c8f78-2f4c-4786-a5e1-d90f633dfebd@github.com> Message-ID: <KWDOAI6zJ5tiSTYAejlT2VbSReSMww2lnI376WbKW1c=.b2570da9-02f8-41c2-99e5-e807a14066f5@github.com> On Wed, 16 Nov 2022 12:35:30 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. >> >> We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: >> https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - 8296785: Use realloc for CHeap-allocated BitMaps > - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction > - 8296774: Removed default MEMFLAGS value from CHeapBitMap LGTM. I followed all allocation paths, seems ok. Had to look awhile until I found where we guarantee realloc(0) never returns null. src/hotspot/share/utilities/bitMap.cpp line 53: > 51: > 52: return map; > 53: } Could this live inside `GrowableBitmap<T>` as a private static function? ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11102 From aph at openjdk.org Wed Nov 16 15:47:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 15:47:05 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <rbjUHcMTBVqqmi0KImuzs0jVOp94PBFVxvsRsv1_0Iw=.89564d70-23c9-4c6a-9cfd-38f2af702b88@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <gvyVzC0NPYwWrx_xHSNugz9DDryV-XAbMwFuzQgpIJE=.b1d10622-d1c9-4e21-8d05-4ada5cf605ec@github.com> <rbjUHcMTBVqqmi0KImuzs0jVOp94PBFVxvsRsv1_0Iw=.89564d70-23c9-4c6a-9cfd-38f2af702b88@github.com> Message-ID: <F3FOR0XhsuCAvOB5jfKOz3qDcSPZqEIPVJIXT6ZXpYQ=.1f2a45a3-d7ac-4201-a90e-6a88b42c0a8f@github.com> On Wed, 16 Nov 2022 11:38:10 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Mm, but this is the type of the value of the `ScopedValue` instance. >> So, the type of the scoped value is `ScopedValue<T>`, the type of the value is `T`, is it not? > > Right - there's "scoped value" which is the holder, and "value of the scoped value" which is what has the type T. You are correct that just dropping "scoped" in there doesn't make things better. Conversely, just leaving "value" and nothing else (as per PR) is ambiguous. It feels like we need some way (uniform throughout the javadoc) to speak about "the value associated to a scoped value instance". I think that's the "bound value" of the instance or the "value bound to" the `ScopedValue<T>` instance. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From rrich at openjdk.org Wed Nov 16 15:50:17 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Nov 2022 15:50:17 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v3] In-Reply-To: <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> Message-ID: <6p2iTiK-RvQtQUUvZHID1kpjZEB1wb72CHEi5X_-zuA=.c447ad16-8a1c-4af3-a062-0b1acbbcc1d0@github.com> On Mon, 14 Nov 2022 16:07:34 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Indentation fix Hi @fisk, I've skimmed the changes. They look good to me. I do have a few comments/questions also. src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp line 876: > 874: > 875: OopMap* map = new OopMap(((int)ContinuationEntry::size() + wordSize) / VMRegImpl::stack_slot_size, 0 /* arg_slots*/); > 876: ContinuationEntry::setup_oopmap(map); I'd suggest to add a comment where the oops are handled. src/hotspot/share/gc/shared/barrierSetStackChunk.cpp line 68: > 66: > 67: virtual void do_oop(oop* p) override { > 68: if (UseCompressedOops) { Wouldn't it be better to hoist the check for `UseCompressedOops`? src/hotspot/share/gc/shenandoah/shenandoahBarrierSetStackChunk.cpp line 30: > 28: > 29: void ShenandoahBarrierSetStackChunk::encode_gc_mode(stackChunkOop chunk, OopIterator* oop_iterator) { > 30: // Nothing to do Shenandoah allows `UseCompressedOops` enabled, doesn't it? Isn't it necessary then to do the encoding as in the super class? ------------- PR: https://git.openjdk.org/jdk/pull/11111 From chagedorn at openjdk.org Wed Nov 16 15:53:12 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Nov 2022 15:53:12 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> Message-ID: <-LPuNwZylHWH7M8ifGeNw27j7T7-kQzbKa4y0H6cHvM=.8ea671b0-980d-4a81-94b6-1ddab48c1fea@github.com> On Wed, 16 Nov 2022 12:23:50 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Always read full filename and strip prefix path and only then cut filename to fit output buffer >> - Merge branch 'master' into JDK-8293422 >> - Merge branch 'master' into JDK-8293422 >> - Review comments from Thomas >> - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ >> - 8293422: DWARF emitted by Clang cannot be parsed > > src/hotspot/share/utilities/elfFile.cpp line 1651: > >> 1649: int bytes_written = jio_snprintf(dst, count, "%s", src); >> 1650: // Add null terminator. >> 1651: dst[count - 1] = '\0'; > > Does it make sense to return a truncated file name up to the caller of `DwarfFile::LineNumberProgram::get_filename_from_header()`? Will this not just confuse him? I think it makes more sense to cleanly handle truncation, and e.g. skip file parsing for dwarf files with too long names. I think you're right that it's rather unexpected to get an incomplete filename back. But just silently skipping the filename might be confusing as well. We could either print an error (I guess that's useful either way but should only be printed with `TraceDwarfLevel`) or just return a generic "buffer overflow" string as filename instead if it fits into the provided filename buffer. And only if that's not possible we could silently skip the filename. Would that be an option? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From aph at openjdk.org Wed Nov 16 16:01:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 16:01:10 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> Message-ID: <FHV54avkRj2f7vf-_chBEqAAVP3r-CZpU9PfJlldWmU=.87413912-0826-4d80-8f71-0134ddd2994a@github.com> On Tue, 15 Nov 2022 19:15:21 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix failing serviceability tests > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 460: > >> 458: * } >> 459: * >> 460: * @param key the ScopedValue key > > should use `@code` or `@link` Sorry, I don't understand what you want. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From chagedorn at openjdk.org Wed Nov 16 16:04:46 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Nov 2022 16:04:46 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> Message-ID: <huGJVrBNvoINQMJiajYJhBLeD6LpYFKUreKCmpEaf48=.693210ab-4185-48f8-91ad-ed68b399b416@github.com> On Wed, 16 Nov 2022 11:59:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Always read full filename and strip prefix path and only then cut filename to fit output buffer >> - Merge branch 'master' into JDK-8293422 >> - Merge branch 'master' into JDK-8293422 >> - Review comments from Thomas >> - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ >> - 8293422: DWARF emitted by Clang cannot be parsed > > src/hotspot/share/utilities/elfFile.cpp line 1605: > >> 1603: uint32_t current_index = 1; // file_names start at index 1 >> 1604: const size_t dwarf_filename_len = 1024; >> 1605: char dwarf_filename[dwarf_filename_len]; // Store the filename read from DWARF which is then copied to 'filename'. > > Putting such a large array on the stack is a bit borderline. Especially in error reporting, where you may build up stack repeatedly via secondary crash handling. I realize though that no good alternatives exist. C-heap may be corrupted, ResourceArea is also out of the question. Could we get away with using filename directly? That's true. Maybe I can change the algorithm to read single characters instead and reset the buffer once I'm encountering a file separator. I'll try that out. > Why uint16_t? There is no specific reason. I'll change it to `uint32_t`. > We have `pointer_delta()` for that btw if you want to be super correct. See globalDefinitions.hpp Ah that's great! Will use that one instead. > src/hotspot/share/utilities/elfFile.hpp line 865: > >> 863: bool get_filename_from_header(uint32_t file_index, char* filename, size_t filename_len); >> 864: static void strip_path_prefix(char* filename, const size_t filename_len); >> 865: static void copy_dwarf_filename_to_filename(char* src, size_t src_len, char* dst, size_t dst_len); > > Stupid question, do these have to be exposed? Or could they be just static helpers in elfFile.cpp? In theory, they don't need to be exposed in the sense of being declared in the header. But in terms of readability, I've decided to put them in the private block of class `LineNumberProgram` which is the only user of these methods. What would be the advantages of moving them completely to `elfFile.cpp` as static helper functions? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From alanb at openjdk.org Wed Nov 16 16:04:17 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 16:04:17 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> Message-ID: <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> On Tue, 15 Nov 2022 18:47:39 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in SegmentScope javadoc src/java.base/share/classes/java/lang/foreign/Arena.java line 132: > 130: * and all the memory segments associated with it can no longer be accessed. Furthermore, any off-heap region of memory backing the > 131: * segments associated with that scope are also released. > 132: * @throws IllegalStateException if the arena has already been {@linkplain #close() closed}. It's not wrong to specify that close throw if already closed but it goes against the advice in AutoCloseable to try to have close methods be idempotent. There may be a good reason for this but I can't help wondering if there are error cases when wrapping that might lead to close being called more than once. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Wed Nov 16 16:07:10 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 16 Nov 2022 16:07:10 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v5] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <uNxm9lN79Wz-secRQNCskASjaro-2X4zKiHAsvaW4To=.48899d3b-74a5-486a-bdaa-8b7974619c08@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - constexpr some functions - Review pt1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/7b1b95f5..3f375cfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=03-04 Stats: 233 lines in 14 files changed: 69 ins; 110 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From stuefe at openjdk.org Wed Nov 16 16:11:51 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 16:11:51 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <-LPuNwZylHWH7M8ifGeNw27j7T7-kQzbKa4y0H6cHvM=.8ea671b0-980d-4a81-94b6-1ddab48c1fea@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> <-LPuNwZylHWH7M8ifGeNw27j7T7-kQzbKa4y0H6cHvM=.8ea671b0-980d-4a81-94b6-1ddab48c1fea@github.com> Message-ID: <S4pT9MhCorn5XvwppyTIKDMWmutvDzcZShwzGbvwGAA=.45cbb5d3-3ef0-4b9b-ba7c-0ad8ab7152b9@github.com> On Wed, 16 Nov 2022 15:50:31 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> src/hotspot/share/utilities/elfFile.cpp line 1651: >> >>> 1649: int bytes_written = jio_snprintf(dst, count, "%s", src); >>> 1650: // Add null terminator. >>> 1651: dst[count - 1] = '\0'; >> >> Does it make sense to return a truncated file name up to the caller of `DwarfFile::LineNumberProgram::get_filename_from_header()`? Will this not just confuse him? I think it makes more sense to cleanly handle truncation, and e.g. skip file parsing for dwarf files with too long names. > > I think you're right that it's rather unexpected to get an incomplete filename back. But just silently skipping the filename might be confusing as well. We could either print an error (I guess that's useful either way but should only be printed with `TraceDwarfLevel`) or just return a generic "buffer overflow" string as filename instead if it fits into the provided filename buffer. And only if that's not possible we could silently skip the filename. Would that be an option? Optional tracing and returning a generic string sounds fine. >> src/hotspot/share/utilities/elfFile.hpp line 865: >> >>> 863: bool get_filename_from_header(uint32_t file_index, char* filename, size_t filename_len); >>> 864: static void strip_path_prefix(char* filename, const size_t filename_len); >>> 865: static void copy_dwarf_filename_to_filename(char* src, size_t src_len, char* dst, size_t dst_len); >> >> Stupid question, do these have to be exposed? Or could they be just static helpers in elfFile.cpp? > > In theory, they don't need to be exposed in the sense of being declared in the header. But in terms of readability, I've decided to put them in the private block of class `LineNumberProgram` which is the only user of these methods. What would be the advantages of moving them completely to `elfFile.cpp` as static helper functions? Oh, I generally just prefer to keep things locally if possible. Slim interfaces, less polluted global namespace, possibly (though not here) less include deps. Don't worry, if you prefer it this way, keep it in. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From mcimadamore at openjdk.org Wed Nov 16 16:16:21 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 16:16:21 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> Message-ID: <GxQqRxeRLyn3r7KOMt7peCz_dz2ih4Z7V2I7a0bm-tg=.6aafdbc9-cdf3-416d-aa5c-94c23640976f@github.com> On Wed, 16 Nov 2022 16:01:52 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in SegmentScope javadoc > > src/java.base/share/classes/java/lang/foreign/Arena.java line 132: > >> 130: * and all the memory segments associated with it can no longer be accessed. Furthermore, any off-heap region of memory backing the >> 131: * segments associated with that scope are also released. >> 132: * @throws IllegalStateException if the arena has already been {@linkplain #close() closed}. > > It's not wrong to specify that close throw if already closed but it goes against the advice in AutoCloseable to try to have close methods be idempotent. There may be a good reason for this but I can't help wondering if there are error cases when wrapping that might lead to close being called more than once. In our experience with using the API, having exceptions when something is funny about close is very valuable info (as also stated in the javadoc). Almost always there's a subtle temporal bug going on which the ISE catches. I'm not sure if here you refer to the fact that the javadoc is being overly broad in saying "already been closed" instead of "already been closed _successfully_" ? What kind of problems are you thinking of? ------------- PR: https://git.openjdk.org/jdk/pull/10872 From alanb at openjdk.org Wed Nov 16 16:16:22 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 16:16:22 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> Message-ID: <Yr6tHpETjKb_0mE36rbbm9d9OVOmfl17SlaeglTrVTs=.2ed7bf1d-9869-4e03-81ad-6fdf7a7a94f7@github.com> On Tue, 15 Nov 2022 18:47:39 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in SegmentScope javadoc src/java.base/share/classes/java/lang/foreign/SegmentScope.java line 8: > 6: > 7: /** > 8: * A segment scope controls access to a memory segment. A passing comment here is that "to a memory segment" hints of one-to-one relationship when it's actually one-to-many. Arena is specified to control the lifecycle "of memory segments". ------------- PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Wed Nov 16 16:17:48 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 16:17:48 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Cleanups - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes - Various include order fixes ------------- Changes: https://git.openjdk.org/jdk/pull/11108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11108&range=01 Stats: 323 lines in 116 files changed: 143 ins; 163 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/11108.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11108/head:pull/11108 PR: https://git.openjdk.org/jdk/pull/11108 From alanb at openjdk.org Wed Nov 16 16:40:31 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 16:40:31 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <GxQqRxeRLyn3r7KOMt7peCz_dz2ih4Z7V2I7a0bm-tg=.6aafdbc9-cdf3-416d-aa5c-94c23640976f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> <GxQqRxeRLyn3r7KOMt7peCz_dz2ih4Z7V2I7a0bm-tg=.6aafdbc9-cdf3-416d-aa5c-94c23640976f@github.com> Message-ID: <YK0th_GmZZssrYNntdATL8n211gcqimnSNAFVv0_mFM=.870d9ed6-30bc-4eb9-874e-93e4c247c20b@github.com> On Wed, 16 Nov 2022 16:13:16 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/foreign/Arena.java line 132: >> >>> 130: * and all the memory segments associated with it can no longer be accessed. Furthermore, any off-heap region of memory backing the >>> 131: * segments associated with that scope are also released. >>> 132: * @throws IllegalStateException if the arena has already been {@linkplain #close() closed}. >> >> It's not wrong to specify that close throw if already closed but it goes against the advice in AutoCloseable to try to have close methods be idempotent. There may be a good reason for this but I can't help wondering if there are error cases when wrapping that might lead to close being called more than once. > > In our experience with using the API, having exceptions when something is funny about close is very valuable info (as also stated in the javadoc). Almost always there's a subtle temporal bug going on which the ISE catches. I'm not sure if here you refer to the fact that the javadoc is being overly broad in saying "already been closed" instead of "already been closed _successfully_" ? What kind of problems are you thinking of? Most of the AutoCloseable in the platform are Closeables where close is specified to have no effect when already closed. With a confined Arena it would be benign for the owner to invoke close again. If it's been useful at finding bugs then okay. The scenario that made me wonder about this is something like the follow where MyWrapper::close invokes Arena::close. try (var arena = Arena.openConfined(); var wrapper = new MyWrapper(arena)) { : } ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Wed Nov 16 16:44:31 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 16:44:31 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <YK0th_GmZZssrYNntdATL8n211gcqimnSNAFVv0_mFM=.870d9ed6-30bc-4eb9-874e-93e4c247c20b@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> <GxQqRxeRLyn3r7KOMt7peCz_dz2ih4Z7V2I7a0bm-tg=.6aafdbc9-cdf3-416d-aa5c-94c23640976f@github.com> <YK0th_GmZZssrYNntdATL8n211gcqimnSNAFVv0_mFM=.870d9ed6-30bc-4eb9-874e-93e4c247c20b@github.com> Message-ID: <IJd6eOdLK-bJOb-SfLTs7rDagHn4FlsBPOJPNKm7DAM=.c3688eed-8821-4d3a-89b1-1e767eb4810e@github.com> On Wed, 16 Nov 2022 16:38:10 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> In our experience with using the API, having exceptions when something is funny about close is very valuable info (as also stated in the javadoc). Almost always there's a subtle temporal bug going on which the ISE catches. I'm not sure if here you refer to the fact that the javadoc is being overly broad in saying "already been closed" instead of "already been closed _successfully_" ? What kind of problems are you thinking of? > > Most of the AutoCloseable in the platform are Closeables where close is specified to have no effect when already closed. With a confined Arena it would be benign for the owner to invoke close again. If it's been useful at finding bugs then okay. The scenario that made me wonder about this is something like the follow where MyWrapper::close invokes Arena::close. > > try (var arena = Arena.openConfined(); > var wrapper = new MyWrapper(arena)) { > : > } Actually, I see that the `@apiNote` we used to have has disappeared in the API reshuffling. I will add it back. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From stefank at openjdk.org Wed Nov 16 16:49:17 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 16:49:17 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v2] In-Reply-To: <KWDOAI6zJ5tiSTYAejlT2VbSReSMww2lnI376WbKW1c=.b2570da9-02f8-41c2-99e5-e807a14066f5@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> <VCm7nSBm1Dgw9LSHBg1FBDiu10MxK-0Po-tLt2mk-ac=.523c8f78-2f4c-4786-a5e1-d90f633dfebd@github.com> <KWDOAI6zJ5tiSTYAejlT2VbSReSMww2lnI376WbKW1c=.b2570da9-02f8-41c2-99e5-e807a14066f5@github.com> Message-ID: <qQYUEmAYovsHM31FHt3fX9jzRL01BPlJGAYRP6ro-Zg=.3edc9463-502f-42c6-a675-03515e1c7ce0@github.com> On Wed, 16 Nov 2022 14:48:57 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - 8296785: Use realloc for CHeap-allocated BitMaps >> - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction >> - 8296774: Removed default MEMFLAGS value from CHeapBitMap > > src/hotspot/share/utilities/bitMap.cpp line 53: > >> 51: >> 52: return map; >> 53: } > > Could this live inside `GrowableBitmap<T>` as a private static function? It could, though I see that function as an implementation detail of ArenaBitMap and ResourceBitMap, that I don't want to burden CHeapBitMap with. I also like if we can keep the definition of GrowableBitMap simple, so that it's easier to read and understand. So, I'd prefer to not have to do this. I hope that's OK. ------------- PR: https://git.openjdk.org/jdk/pull/11102 From mcimadamore at openjdk.org Wed Nov 16 16:54:41 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 16:54:41 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <IJd6eOdLK-bJOb-SfLTs7rDagHn4FlsBPOJPNKm7DAM=.c3688eed-8821-4d3a-89b1-1e767eb4810e@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> <Rcdl--DkeXEh4meWxeCpSm3hEIDQs6HyFUuAn7Kt4aY=.60d633a0-9f8d-4f45-bf6f-d52cdf279c66@github.com> <GxQqRxeRLyn3r7KOMt7peCz_dz2ih4Z7V2I7a0bm-tg=.6aafdbc9-cdf3-416d-aa5c-94c23640976f@github.com> <YK0th_GmZZssrYNntdATL8n211gcqimnSNAFVv0_mFM=.870d9ed6-30bc-4eb9-874e-93e4c247c20b@github.com> <IJd6eOdLK-bJOb-SfLTs7rDagHn4FlsBPOJPNKm7DAM=.c3688eed-8821-4d3a-89b1-1e767eb4810e@github.com> Message-ID: <XldFLHsbVJyfLolNAa3nJjl1VyRWDqzec8sXXqs0Oqc=.58322b61-7fa0-49a8-91e5-3bddbfdef244@github.com> On Wed, 16 Nov 2022 16:41:45 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Most of the AutoCloseable in the platform are Closeables where close is specified to have no effect when already closed. With a confined Arena it would be benign for the owner to invoke close again. If it's been useful at finding bugs then okay. The scenario that made me wonder about this is something like the follow where MyWrapper::close invokes Arena::close. >> >> try (var arena = Arena.openConfined(); >> var wrapper = new MyWrapper(arena)) { >> : >> } > > Actually, I see that the `@apiNote` we used to have has disappeared in the API reshuffling. I will add it back. > Most of the AutoCloseable in the platform are Closeables where close is specified to have no effect when already closed. With a confined Arena it would be benign for the owner to invoke close again. If it's been useful at finding bugs then okay. The scenario that made me wonder about this is something like the follow where MyWrapper::close invokes Arena::close. > > ``` > try (var arena = Arena.openConfined(); > var wrapper = new MyWrapper(arena)) { > : > } > ``` Sure - this would be problematic - however it seems an edge case (could the TWR just use MyWrapper?) I'd prefer to leave it as is for now, and revisit - so far we had no indications of this being a real problem, whereas we had cases where the thrown exception has been useful to spot issues. If consistency with the rest of the JDK is considered more important we can fix it later. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From psandoz at openjdk.org Wed Nov 16 16:54:47 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 16 Nov 2022 16:54:47 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> Message-ID: <pwQc-QIMW2rHOtyW9Dmlx8ur4xvAHQuyMI_LzFMmLmo=.3c518cbf-7d68-4b73-9f22-d59478555ace@github.com> On Tue, 15 Nov 2022 18:47:39 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in SegmentScope javadoc src/java.base/share/classes/java/lang/foreign/Arena.java line 132: > 130: * and all the memory segments associated with it can no longer be accessed. Furthermore, any off-heap region of memory backing the > 131: * segments associated with that scope are also released. > 132: * @throws IllegalStateException if the arena has already been {@linkplain #close() closed}. JavaDoc was pointing to itself. Suggestion: * @throws IllegalStateException if the arena has already been closed. src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 109: > 107: * Finally, access operations on a memory segment are subject to the thread-confinement checks enforced by the associated > 108: * scope; that is, if the segment is the {@linkplain SegmentScope#global() global scope} or an {@linkplain SegmentScope#auto() automatic scope}, > 109: * it can be accessed by multiple threads. If the segment is associatd with an arena scope, then it can only be Typo: Suggestion: * it can be accessed by multiple threads. If the segment is associated with an arena scope, then it can only be src/java.base/share/classes/java/lang/foreign/SegmentScope.java line 10: > 8: * A segment scope controls access to a memory segment. > 9: * <p> > 10: * A memory segment can only be accessed while its scope is {@linkplain #isAlive() alive}. Moreoever, Typo: Suggestion: * A memory segment can only be accessed while its scope is {@linkplain #isAlive() alive}. Moreover, ------------- PR: https://git.openjdk.org/jdk/pull/10872 From aph at openjdk.org Wed Nov 16 16:55:24 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 16:55:24 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Javadoc changes. - ProblemList.txt cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/2a2b0cca..280cd6c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=08-09 Stats: 46 lines in 3 files changed: 8 ins; 6 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From jvernee at openjdk.org Wed Nov 16 17:00:04 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 16 Nov 2022 17:00:04 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v5] In-Reply-To: <uNxm9lN79Wz-secRQNCskASjaro-2X4zKiHAsvaW4To=.48899d3b-74a5-486a-bdaa-8b7974619c08@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <uNxm9lN79Wz-secRQNCskASjaro-2X4zKiHAsvaW4To=.48899d3b-74a5-486a-bdaa-8b7974619c08@github.com> Message-ID: <yFyylv85NW3Gcns9L2W3zSCdH0J9RqlVl1_iHbFHIHc=.062044d8-4170-4e0d-a9b6-a9bb25a27f0d@github.com> On Wed, 16 Nov 2022 16:07:10 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - constexpr some functions > - Review pt1 I've address all review comments so far. I've limited use of `constexpr` to functions that are called on register constants to avoid having to put `constexpr` on too many things (for now). ------------- PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Wed Nov 16 17:04:03 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 16 Nov 2022 17:04:03 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v6] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <xJeSQMXY8k99ViKk6B1ceb6SIia3OCexyoHn-dLuUDY=.3d0d0ae6-64ea-4bb1-94cd-196d82cb8be4@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: fix stubs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/3f375cfd..4d440443 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=04-05 Stats: 5 lines in 5 files changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From stefank at openjdk.org Wed Nov 16 17:03:21 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 16 Nov 2022 17:03:21 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v3] In-Reply-To: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> Message-ID: <gVFqJ_xOx-LskzaCU9mylHrccLnoCOAWu-zeK_IEzUU=.97674576-c596-48b2-8249-ec62f2edccf2@github.com> > Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. > > We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: > https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fixes after merge - Merge remote-tracking branch 'upstream/master' into 8296785_bitmap_realloc - 8296785: Use realloc for CHeap-allocated BitMaps - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction - 8296774: Removed default MEMFLAGS value from CHeapBitMap ------------- Changes: https://git.openjdk.org/jdk/pull/11102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11102&range=02 Stats: 226 lines in 5 files changed: 174 ins; 30 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/11102.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11102/head:pull/11102 PR: https://git.openjdk.org/jdk/pull/11102 From stuefe at openjdk.org Wed Nov 16 17:03:27 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 16 Nov 2022 17:03:27 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v2] In-Reply-To: <qQYUEmAYovsHM31FHt3fX9jzRL01BPlJGAYRP6ro-Zg=.3edc9463-502f-42c6-a675-03515e1c7ce0@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> <VCm7nSBm1Dgw9LSHBg1FBDiu10MxK-0Po-tLt2mk-ac=.523c8f78-2f4c-4786-a5e1-d90f633dfebd@github.com> <KWDOAI6zJ5tiSTYAejlT2VbSReSMww2lnI376WbKW1c=.b2570da9-02f8-41c2-99e5-e807a14066f5@github.com> <qQYUEmAYovsHM31FHt3fX9jzRL01BPlJGAYRP6ro-Zg=.3edc9463-502f-42c6-a675-03515e1c7ce0@github.com> Message-ID: <0n74vF4l4Etul2U_Clqrh9U2Jf_N7LvzbTYORdzzevE=.ab48a17c-d2a8-40e9-b65a-79b2fb808f00@github.com> On Wed, 16 Nov 2022 16:47:01 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> src/hotspot/share/utilities/bitMap.cpp line 53: >> >>> 51: >>> 52: return map; >>> 53: } >> >> Could this live inside `GrowableBitmap<T>` as a private static function? > > It could, though I see that function as an implementation detail of ArenaBitMap and ResourceBitMap, that I don't want to burden CHeapBitMap with. I also like if we can keep the definition of GrowableBitMap simple, so that it's easier to read and understand. So, I'd prefer to not have to do this. I hope that's OK. Fine with me. ------------- PR: https://git.openjdk.org/jdk/pull/11102 From aboldtch at openjdk.org Wed Nov 16 17:44:07 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 16 Nov 2022 17:44:07 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> Message-ID: <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> On Mon, 14 Nov 2022 07:29:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors > - Feedback David Always nice with more, simple and unobtrusive development tooling. Just have a couple of comments/questions? src/hotspot/share/utilities/vmError.cpp line 1635: > 1633: // Any information (signal, context, siginfo etc) printed here should use the function > 1634: // arguments, not the information stored in *this, since those describe the primary crash. > 1635: char tmp[256]; // cannot use global scratch buffer Is there any problems with making this static? Given that we care about the stack depth when we have repeated crashes. src/hotspot/share/utilities/vmError.cpp line 1641: > 1639: _current_step_info, id); > 1640: if (os::exception_name(id, tmp, sizeof(tmp))) { > 1641: st->print(", %s (0x%x) at pc=" PTR_FORMAT, tmp, id, p2i(pc)); Not really relevant for this PR. But do not like the inconsistency that the id is printed as hex here, decimal in os::print_siginfo for posix, and hex in os::print_siginfo for windows. src/hotspot/share/utilities/vmError.cpp line 1653: > 1651: st->print_cr("]"); > 1652: #ifdef ASSERT > 1653: if (ErrorLogSecondaryErrorDetails) { > But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Should there be logic here that at least guarantee progress? ```c++ if (ErrorLogSecondaryErrorDetails && !_some_bool) { _some_bool = true; [...] } _some_bool = false; ------------- PR: https://git.openjdk.org/jdk/pull/11118 From never at openjdk.org Wed Nov 16 17:58:40 2022 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Nov 2022 17:58:40 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> Message-ID: <wwPTou99ofYY5NYFepDOTwlQP374s6pIwrbFLiWDfMA=.debe6ee8-b1bb-40dc-92df-7ccdb33c888a@github.com> On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez <never at openjdk.org> wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. mach5 runs with the requested options on a GraalVM have passed so I'm going to merge this now. Sound good? ------------- PR: https://git.openjdk.org/jdk/pull/5625 From mcimadamore at openjdk.org Wed Nov 16 18:18:43 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 18:18:43 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v6] In-Reply-To: <xJeSQMXY8k99ViKk6B1ceb6SIia3OCexyoHn-dLuUDY=.3d0d0ae6-64ea-4bb1-94cd-196d82cb8be4@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <xJeSQMXY8k99ViKk6B1ceb6SIia3OCexyoHn-dLuUDY=.3d0d0ae6-64ea-4bb1-94cd-196d82cb8be4@github.com> Message-ID: <ygvevJWEJF_cnHzGVDruHLvlNbiYTQ1hlN6evZvcj94=.1022d291-5b03-427e-ac42-233a52704ac1@github.com> On Wed, 16 Nov 2022 17:04:03 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix stubs Java changes look good ------------- Marked as reviewed by mcimadamore (Reviewer). PR: https://git.openjdk.org/jdk/pull/11019 From redestad at openjdk.org Wed Nov 16 18:22:30 2022 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 16 Nov 2022 18:22:30 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <d5iY-dwoEXUM9u1-exI-nkWwd9Hv-1QhVFSACbLtsz0=.2f040010-af30-40a5-8f19-1c5e3f8a01cd@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode I'm getting pulled into other tasks and would request for this to be either accepted as-is, rejected or picked up by someone else to rewrite it to something that can be accepted. Obviously I'm biased towards acceptance: While imperfect, it provides improved testing - both functional and performance-wise - and establishes a significantly improved benchmark for more future-proof solutions to beat. There are many ways to iteratively improve upon this solution, some of which would even simplify the implementation. But in the face of upcoming changes that might allow C2 to optimize these kinds of loops without intrinsic support I am not sure spending more time on perfecting the current patch is worth our while. Rejecting it might be the reasonable thing to do, too, especially if the C2 loop optimizations @iwanowww points out might be coming around sooner rather than later. Even if that's not coming soon, the PR at hand adds a chunk of complexity for the compiler team to maintain. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From mcimadamore at openjdk.org Wed Nov 16 18:28:23 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 16 Nov 2022 18:28:23 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <FHV54avkRj2f7vf-_chBEqAAVP3r-CZpU9PfJlldWmU=.87413912-0826-4d80-8f71-0134ddd2994a@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <FHV54avkRj2f7vf-_chBEqAAVP3r-CZpU9PfJlldWmU=.87413912-0826-4d80-8f71-0134ddd2994a@github.com> Message-ID: <NFdU7AI9d-0to8S9U8ejxndmN5goTf7szK2kVuMzMcU=.cd22a103-7e3a-4049-9c2b-15b34e8d0ab5@github.com> On Wed, 16 Nov 2022 15:58:49 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 460: >> >>> 458: * } >>> 459: * >>> 460: * @param key the ScopedValue key >> >> should use `@code` or `@link` > > Sorry, I don't understand what you want. There is a "bare" reference to `ScopedValue`. Should it be `{@code ScopedValue}` or `{@link ScopedValue}` ? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From kvn at openjdk.org Wed Nov 16 18:47:00 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 16 Nov 2022 18:47:00 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: <wwPTou99ofYY5NYFepDOTwlQP374s6pIwrbFLiWDfMA=.debe6ee8-b1bb-40dc-92df-7ccdb33c888a@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> <wwPTou99ofYY5NYFepDOTwlQP374s6pIwrbFLiWDfMA=.debe6ee8-b1bb-40dc-92df-7ccdb33c888a@github.com> Message-ID: <WckWlhFMisw1KOuUOacM2mSHot3l897kx9-3nXQrLPY=.a9c77af5-b493-4b03-bcd9-469c91cff4eb@github.com> On Wed, 16 Nov 2022 17:56:12 GMT, Tom Rodriguez <never at openjdk.org> wrote: > mach5 runs with the requested options on a GraalVM have passed so I'm going to merge this now. Sound good? Yes ------------- PR: https://git.openjdk.org/jdk/pull/5625 From aph at openjdk.org Wed Nov 16 19:06:59 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 19:06:59 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> Message-ID: <6PNLRgFIjkvIRL_tIW0btKxdEapZI6_JC8roNRFBSws=.a6292551-ac7b-4f45-a851-3c4c614edc3b@github.com> On Fri, 4 Nov 2022 09:50:10 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 481: >> >>> 479: } >>> 480: */ >>> 481: return findBinding() != Snapshot.NIL; >> >> This?should probably?call `Cache.put(this,?value)` when?`findBinding()` isn?t?`Snapshot.NIL`, since?it?s?likely that?`isBound()` will?most?commonly be?used in?the?form?of: >> >> if (SCOPED_VALUE.isBound()) { >> final var value = SCOPED_VALUE.get(); >> // do something with `value` >> } >> >> >> -------------------------------------------------------------------------------- >> >> Suggestion: >> >> var value = findBinding(); >> if (value == Snapshot.NIL) { >> return false; >> } >> Cache.put(this, value); >> return true; > > Probably so, yes. I'll have a look at that along with caching failure. So I just did the experiment of caching failures and the result of `isBound()`. This test: @Benchmark @OutputTimeUnit(TimeUnit.NANOSECONDS) public int thousandMaybeGets(Blackhole bh) throws Exception { int result = 0; for (int i = 0; i < 1_000; i++) { if (ScopedValuesData.sl1.isBound()) { result += ScopedValuesData.sl1.get(); } } return result; } Before and after: ScopedValues.thousandMaybeGets avgt 10 13436.112 ? 20.885 ns/op ScopedValues.thousandMaybeGets avgt 10 56.315 ? 0.583 ns/op You may have a point. The experiment is on a branch called `JDK-8286666-cache-queries` in [My personal repo](https://github.com/theRealAph/jdk). I'd push it now but it's getting a bit late to make such changes now. WDYT? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 16 19:11:15 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Nov 2022 19:11:15 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <NFdU7AI9d-0to8S9U8ejxndmN5goTf7szK2kVuMzMcU=.cd22a103-7e3a-4049-9c2b-15b34e8d0ab5@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <FHV54avkRj2f7vf-_chBEqAAVP3r-CZpU9PfJlldWmU=.87413912-0826-4d80-8f71-0134ddd2994a@github.com> <NFdU7AI9d-0to8S9U8ejxndmN5goTf7szK2kVuMzMcU=.cd22a103-7e3a-4049-9c2b-15b34e8d0ab5@github.com> Message-ID: <b14aZ4u72bwzdV7d_XpSwz4ztTSkmRUn4UhQlVhzZPo=.28665341-dd5c-4040-9275-70194d931287@github.com> On Wed, 16 Nov 2022 18:26:15 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> Sorry, I don't understand what you want. > > There is a "bare" reference to `ScopedValue`. Should it be `{@code ScopedValue}` or `{@link ScopedValue}` ? Ahh, OK. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From never at openjdk.org Wed Nov 16 19:15:02 2022 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Nov 2022 19:15:02 GMT Subject: RFR: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value In-Reply-To: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> References: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> Message-ID: <TN9KlLQWHxFz7139CkheZDb4tjwjjbrXmyB9ArcOWTo=.6db1f87a-44c0-49a5-b3d8-b2c66d4d8ae6@github.com> On Mon, 14 Nov 2022 19:37:20 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > This PR fixes a bug related to `HotSpotResolvedJavaFieldImpl.index`. Its value is passed into the `HotSpotResolvedJavaFieldImpl` constructor as an `int`, and is returned by `getIndex()` as an `int` but it was stored as a `short`. This meant that unsigned 16-bit values were not handled correctly. > > Also included are some related JVMCI cleanups: > * added and fixed doc related to `ResolvedJavaField.getOffset()` > * replaced assertions with always-enabled checks Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11142 From alanb at openjdk.org Wed Nov 16 19:17:04 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 16 Nov 2022 19:17:04 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <b14aZ4u72bwzdV7d_XpSwz4ztTSkmRUn4UhQlVhzZPo=.28665341-dd5c-4040-9275-70194d931287@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <FHV54avkRj2f7vf-_chBEqAAVP3r-CZpU9PfJlldWmU=.87413912-0826-4d80-8f71-0134ddd2994a@github.com> <NFdU7AI9d-0to8S9U8ejxndmN5goTf7szK2kVuMzMcU=.cd22a103-7e3a-4049-9c2b-15b34e8d0ab5@github.com> <b14aZ4u72bwzdV7d_XpSwz4ztTSkmRUn4UhQlVhzZPo=.28665341-dd5c-4040-9275-70194d931287@github.com> Message-ID: <B1UmptxQ5alAKr-9Nofkzo04KEdU4iOF47sJSI-dczc=.af326156-a6d3-4022-9db6-dd06d990ea80@github.com> On Wed, 16 Nov 2022 19:08:48 GMT, Andrew Haley <aph at openjdk.org> wrote: >> There is a "bare" reference to `ScopedValue`. Should it be `{@code ScopedValue}` or `{@link ScopedValue}` ? > > Ahh, OK. We fixed those in the recent update. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Wed Nov 16 19:32:04 2022 From: duke at openjdk.org (zzambers) Date: Wed, 16 Nov 2022 19:32:04 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> References: <mkKq-X8pqY91OIEeoTxdeXvPr3Xsuk2P1IGcVaB0mt0=.b8c17931-2d34-49cd-b22e-03cf8d23ce33@github.com> Message-ID: <hN-oLIEnhxT1Dd-Wpb5NuhUOJYzZxnT0Oer1LvhLZuo=.d9f784b7-8b54-4ed0-89b1-9940e927ac35@github.com> On Wed, 26 Oct 2022 16:43:26 GMT, zzambers <duke at openjdk.org> wrote: > Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. @TobiHartmann @vnkozlov Thanks ------------- PR: https://git.openjdk.org/jdk/pull/10875 From duke at openjdk.org Wed Nov 16 20:52:14 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 20:52:14 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: redo register alloc with explicit func params ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/cbf49380..dbdfd1dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=18-19 Stats: 387 lines in 2 files changed: 83 ins; 51 del; 253 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From sspitsyn at openjdk.org Wed Nov 16 21:12:22 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 16 Nov 2022 21:12:22 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> Message-ID: <4x7G5rkwCDOvzpEaR6SUV7jYuXaMEypXYESPYZHltBg=.33382daf-a080-43f8-be98-3c74bf8ed73e@github.com> On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez <never at openjdk.org> wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Yes. ------------- PR: https://git.openjdk.org/jdk/pull/5625 From duke at openjdk.org Wed Nov 16 21:12:26 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 21:12:26 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <QaZ2HofRpsSEDgL-WPjiOtuuk-WMZ7Hvfm7dgNb6OSo=.cd60e9e2-9470-488e-8889-6860b0d33d73@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <QaZ2HofRpsSEDgL-WPjiOtuuk-WMZ7Hvfm7dgNb6OSo=.cd60e9e2-9470-488e-8889-6860b0d33d73@github.com> Message-ID: <EFPZsMWQisEevn72ArFgikPetEgcmTW9mAEYSJxTAp0=.b17fabc0-ca03-406e-b967-d979c481b368@github.com> On Tue, 15 Nov 2022 19:30:23 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - make UsePolyIntrinsics option diagnostic >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - ... and 13 more: https://git.openjdk.org/jdk/compare/e269dc03...a26ac7db > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 370: > >> 368: // Middle 44-bit limbs of new blocks >> 369: __ vpsrlq(L1, L0, 44, Assembler::AVX_512bit); >> 370: __ vpsllq(TMP2, TMP1, 20, Assembler::AVX_512bit); > > Any particular reason to use `TMP2` here? Can you just update `TMP1` instead (w/ `vpsllq(TMP1, TMP1, 20, Assembler::AVX_512bit);`)? Thanks for the catch. Removed TMP2. (Several refactors ago, `D[01]` and `L[0-2]` used the same registers, because I was running out.. likely forgot to cleanup after I removed 2/3 of the optimizations and re-did register allocation) done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dnsimon at openjdk.org Wed Nov 16 21:24:37 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 16 Nov 2022 21:24:37 GMT Subject: Integrated: 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value In-Reply-To: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> References: <KZxa8Bort_KINFAAyAIGsZgRF6PBRxJGONS_u8t_rVI=.de61e7e8-518a-4cf5-b2b5-68cf4f71cb42@github.com> Message-ID: <6LiYFK7I-qA4tGE1_urblEa7HcJd94-Chh7PQVoKXcw=.34e2ae7f-87dd-44b1-aa0d-6eae0f47ec59@github.com> On Mon, 14 Nov 2022 19:37:20 GMT, Doug Simon <dnsimon at openjdk.org> wrote: > This PR fixes a bug related to `HotSpotResolvedJavaFieldImpl.index`. Its value is passed into the `HotSpotResolvedJavaFieldImpl` constructor as an `int`, and is returned by `getIndex()` as an `int` but it was stored as a `short`. This meant that unsigned 16-bit values were not handled correctly. > > Also included are some related JVMCI cleanups: > * added and fixed doc related to `ResolvedJavaField.getOffset()` > * replaced assertions with always-enabled checks This pull request has now been integrated. Changeset: 95c390ec Author: Doug Simon <dnsimon at openjdk.org> URL: https://git.openjdk.org/jdk/commit/95c390ec75eec31cdf613c8bb236e43aa65a1bb5 Stats: 180 lines in 7 files changed: 137 ins; 5 del; 38 mod 8296956: [JVMCI] HotSpotResolvedJavaFieldImpl.getIndex returns wrong value Reviewed-by: thartmann, never ------------- PR: https://git.openjdk.org/jdk/pull/11142 From duke at openjdk.org Wed Nov 16 21:34:22 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 21:34:22 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <r_1O7IZ10L42qLo58Al4P7vopxNimqQf_JGXTozhSrQ=.02f7ac95-6ee2-409c-8730-649eae6835fe@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <i2KXAYD9RaayPF684xwQgBredelBO5O6oyO28aETB0E=.329c21c6-84cb-4354-849d-0fdf8ca19e59@github.com> <r_1O7IZ10L42qLo58Al4P7vopxNimqQf_JGXTozhSrQ=.02f7ac95-6ee2-409c-8730-649eae6835fe@github.com> Message-ID: <62fFZ_M2aZHxrUV73RXbAVDUIGtpMOUFL4HdjqLqFJI=.392ad4b7-1917-49c6-b36b-178beee57102@github.com> On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >>> On other hand, there are functions like poly1305_multiply8_avx512 and poly1305_process_blocks_avx512 that use a lot of temp registers. I think it makes sense to keep those as 'function-header declarations'. >> >> I agree with you on `poly1305_process_blocks_avx512`, but `poly1305_multiply8_avx512` already takes 8 arguments. Putting 8 more arguments for temps doesn't look prohibitive. >> >>> I think it makes sense to keep those as 'function-header declarations'. >> >> IMO it's not enough. Ideally, if there are any implicit usages, those should be clearly spelled out at every call site. > > Changed just the three `*limbs*` functions. Lifted everything pretty much to just `poly1305_process_blocks_avx512` and `generate_poly1305_processBlocks` (i.e. two register maps) Took some time to make it 'reasonable' again, but I think it makes sense. (But then, true test would be me looking a month later or if it makes sense to others) Had to cleanup the names; 'local' names could all be play on `tmp`.. but the register reuse is much clearer from the 'global' names. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 21:34:26 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 21:34:26 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17] In-Reply-To: <fw96wWvrsbFqCZc16QUimZ6Cg0OhKARsTY-oeRbfT-I=.5bc6a1e2-d99d-4784-9b41-d4722793c613@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <7hXP-vwxc6J7fklu8QuJqiIcSQRff-QyR1SZ0Fzfqmc=.33a38a51-38c3-451a-a756-ed538507f04e@github.com> <fw96wWvrsbFqCZc16QUimZ6Cg0OhKARsTY-oeRbfT-I=.5bc6a1e2-d99d-4784-9b41-d4722793c613@github.com> Message-ID: <uTDRjOOo2n7sl7BHwuZ0y7H-YTuqXmzJYibjIp9RbN0=.718c0b9c-c71b-468f-86e8-a0a40daca038@github.com> On Tue, 15 Nov 2022 19:44:16 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's review >> - live review with Sandhya >> - jcheck >> - Sandhya's review >> - fix windows and 32b linux builds >> - add getLimbs to interface and reviews >> - fix 32-bit build >> - ... and 15 more: https://git.openjdk.org/jdk/compare/7357a1a3...8f5942d9 > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 1004: > >> 1002: __ jcc(Assembler::less, L_process16Loop); >> 1003: >> 1004: poly1305_process_blocks_avx512(input, length, > > I'd like to see a comment here explaining what register effects are implicit. > > `poly1305_process_blocks_avx512` has the following comment, but it doesn't mention xmm registers: > > // Register Map: > // reserved: rsp, rbp, rcx > // PARAMs: rdi, rbx, rsi, r8-r12 > // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx Just redid the register allocation, comments, names, function parameters.. hope its better ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 21:34:22 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 21:34:22 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16] In-Reply-To: <jOU2YrKbL5IN9drxis9yPT_AlliER47yfWT82oCm8_g=.92f9a11a-4f49-4897-974d-f77c5d4b6a21@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <QDsvvBP-whhXZ1wVWPPWspd_rbmy6QRmycHBhd0SRw8=.b709dd5e-2b99-4f88-a8af-0df9c7708235@github.com> <6oNkr_1EGAdRQqa7GDrsa-tIpV_kO-_HJAjdA8Mkf28=.34da74e0-c6f1-4eec-bc45-8c8dd02f68f0@github.com> <-JVYIHKOY_LuVTqyH5xuubtPdk8pK_wi5z-8pestRis=.e63938ab-0ac2-4880-8238-e6e6d8debf03@github.com> <OfSiSh0ho8oFGm5lOgCGxw84XjHzQHdXvyDOcZBQBXo=.f243574a-c6aa-47b2-8bb1-7951232bf83d@github.com> <9p2RTAI9FPWstQu0OtpSmSB7dqhFwmxbw86zZQg4GtU=.1be10660-ee5d-4654-9d4e-4fe3e449fd9b@github.com> <jOU2YrKbL5IN9drxis9yPT_AlliER47yfWT82oCm8_g=.92f9a11a-4f49-4897-974d-f77c5d4b6a21@github.com> Message-ID: <9Mqr0y4WvsMWZ0WjKVW1gJPZ7tJAuVmZcEVGXNTU8uU=.4d4ffb1c-2813-4567-8495-1d8452079a79@github.com> On Tue, 15 Nov 2022 23:51:22 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Added a comment, hopefully less confusing. > > On a second thought, passing derived pointers as arguments doesn't mix well with safepoint awareness. > (And this stub eventually has to become safepoint aware.) > Deriving a pointer inside the stub from a base oop and offset is trivial, recovering base oop from derived pointer is hard. > > It doesn't mean we have to address it right now. Left it as is. I also postponed Bytebuffer support for now, for a separate PR.. we can also fix it then? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From amenkov at openjdk.org Wed Nov 16 22:29:40 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 16 Nov 2022 22:29:40 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> Message-ID: <76WJPgZeoJ2ysTVLtjOJ_UVcDtidmptrSe618QEyUR4=.f3c72cfe-2f1e-4ff7-b496-f30e3dcf1ecd@github.com> On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez <never at openjdk.org> wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Marked as reviewed by amenkov (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/5625 From coleenp at openjdk.org Wed Nov 16 22:30:19 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Nov 2022 22:30:19 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <Aw2_2T-_e-gTLXM88dtIhSKX2o5HdfNf8ozjYd7mlA8=.0e8e2c38-fd83-491b-9417-58c05907bc28@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file The release note is closed/delivered which I think blocked reviewing this PR. Please review! ------------- PR: https://git.openjdk.org/jdk/pull/11023 From duke at openjdk.org Wed Nov 16 22:41:13 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 22:41:13 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14] In-Reply-To: <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <BTbk9ljmz_Cwa7x7buftpILEvm88PQNE8frL57YTQlw=.fe6189e1-fea7-401f-87e5-7ff7417cd9f0@github.com> <L4R9rjMDy1jtGusy2kAN13OrRK9P6UwQvr5jaGZNHUU=.61bf2e0d-7739-4e54-967e-870bff7e52f5@github.com> Message-ID: <_Syg8xU1sH0-zQuv70r8OKuK9KwSIHvl4GzdJF_Gy9s=.247f2738-5d1e-4bf9-ac1e-93034d446f7b@github.com> On Fri, 11 Nov 2022 01:43:46 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> live review with Sandhya > > Overall, it looks good. @iwanowww Answered your review comments, please take a look again? Thanks again! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sviswanathan at openjdk.org Wed Nov 16 22:50:31 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Nov 2022 22:50:31 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> Message-ID: <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > redo register alloc with explicit func params src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: > 915: // Cleanup > 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); > 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); You could use T0, T1 in place of xmm0, xmm1 here. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Wed Nov 16 23:10:31 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Nov 2022 23:10:31 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> Message-ID: <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> On Wed, 16 Nov 2022 22:47:37 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: > >> 915: // Cleanup >> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); >> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); > > You could use T0, T1 in place of xmm0, xmm1 here. Or simply switch to `vzeroall` for `xmm0` - `xmm15`. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Wed Nov 16 23:19:15 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Nov 2022 23:19:15 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> Message-ID: <8UaOUsGNlGnh87OM1y8tMC6pVfeFn0nYUHyhvT7J-ss=.4e581e46-4759-4f69-8a6a-6383bc6f16de@github.com> On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > redo register alloc with explicit func params src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: > 754: > 755: // Store R^8-R for later use > 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); Could these vector spills be eliminated? I counted 8 spare zmm registers available across the vector loop (xmm7-xmm12, xmm30, xmm31). And here's what is explicitly used in `process256Loop`: D0 D1 = xmm2-xmm3 B0 B1 B2 B3 B4 B5 = xmm19-xmm24 TMP = xmm6 A0 A1 A2 A3 A4 A5 = xmm13-xmm18 R0 R1 R2 R1P R2P = xmm25-xmm29 T0 T1 T2 T3 T4 T5 = xmm0-xmm5 ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 23:19:16 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 23:19:16 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <8UaOUsGNlGnh87OM1y8tMC6pVfeFn0nYUHyhvT7J-ss=.4e581e46-4759-4f69-8a6a-6383bc6f16de@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <8UaOUsGNlGnh87OM1y8tMC6pVfeFn0nYUHyhvT7J-ss=.4e581e46-4759-4f69-8a6a-6383bc6f16de@github.com> Message-ID: <Rwzj8FudpIjNzfdKft459T7lSt7O1IFD5FrX2o1fxpg=.c27afeef-dbaa-42d5-b0d8-42d0ce8f9247@github.com> On Wed, 16 Nov 2022 23:12:28 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: > >> 754: >> 755: // Store R^8-R for later use >> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); > > Could these vector spills be eliminated? I counted 8 spare zmm registers available across the vector loop (xmm7-xmm12, xmm30, xmm31). > > And here's what is explicitly used in `process256Loop`: > > D0 D1 = xmm2-xmm3 > B0 B1 B2 B3 B4 B5 = xmm19-xmm24 > TMP = xmm6 > A0 A1 A2 A3 A4 A5 = xmm13-xmm18 > R0 R1 R2 R1P R2P = xmm25-xmm29 > T0 T1 T2 T3 T4 T5 = xmm0-xmm5 Interesting!! Let me try that! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 23:19:19 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 23:19:19 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> Message-ID: <kyCbBQ9ukAXadLWfvdsrM1-u40o8voaN9PM7sUY8YGs=.ffed6f74-45c0-4fd0-bf9f-967a6f40f6e9@github.com> On Wed, 16 Nov 2022 23:08:16 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: >> >>> 915: // Cleanup >>> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); >>> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); >> >> You could use T0, T1 in place of xmm0, xmm1 here. > > Or simply switch to `vzeroall` for `xmm0` - `xmm15`. ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and I figured since I already have to do the xmm16-29, might as well do them all.. should I add that instruction too? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sspitsyn at openjdk.org Wed Nov 16 23:23:20 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 16 Nov 2022 23:23:20 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <G9LGwMDg1s-ZYehB_WUZ8G7ElSqhSXhEhZ8gIMHk4jY=.575720ff-bf8a-4f38-a839-e0ed491b8020@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11023 From vlivanov at openjdk.org Wed Nov 16 23:41:16 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 16 Nov 2022 23:41:16 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <kyCbBQ9ukAXadLWfvdsrM1-u40o8voaN9PM7sUY8YGs=.ffed6f74-45c0-4fd0-bf9f-967a6f40f6e9@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> <kyCbBQ9ukAXadLWfvdsrM1-u40o8voaN9PM7sUY8YGs=.ffed6f74-45c0-4fd0-bf9f-967a6f40f6e9@github.com> Message-ID: <2mAogemIauwWUYmfUtLgDNxtkskab6gjPrcFIQghuYk=.3f88c715-2d68-4995-9eaa-e989b6b5be8f@github.com> On Wed, 16 Nov 2022 23:14:45 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Or simply switch to `vzeroall` for `xmm0` - `xmm15`. > > ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and I figured since I already have to do the xmm16-29, might as well do them all.. should I add that instruction too? Yes, please. And for the upper half of register file, just code it as a loop over register range: for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { XMMRegister rxmm = as_XMMRegister(rxmm_num); __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); } or even // Zeroes zmm16-zmm31. for (XMMRegister rxmm = xmm16; rxmm->is_valid(); rxmm = rxmm->successor()) { __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); } ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Nov 16 23:45:29 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Nov 2022 23:45:29 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <2mAogemIauwWUYmfUtLgDNxtkskab6gjPrcFIQghuYk=.3f88c715-2d68-4995-9eaa-e989b6b5be8f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> <kyCbBQ9ukAXadLWfvdsrM1-u40o8voaN9PM7sUY8YGs=.ffed6f74-45c0-4fd0-bf9f-967a6f40f6e9@github.com> <2mAogemIauwWUYmfUtLgDNxtkskab6gjPrcFIQghuYk=.3f88c715-2d68-4995-9eaa-e989b6b5be8f@github.com> Message-ID: <8Wbchzsuzf9Vx7btDUoVEevuDmCdRCuKMlZZjBEyXmg=.8c6ef831-9f5c-4b32-b007-c0bca9161c9f@github.com> On Wed, 16 Nov 2022 23:39:00 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and I figured since I already have to do the xmm16-29, might as well do them all.. should I add that instruction too? > > Yes, please. And for the upper half of register file, just code it as a loop over register range: > > for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { > XMMRegister rxmm = as_XMMRegister(rxmm_num); > __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); > } > > or even > > // Zeroes zmm16-zmm31. > for (XMMRegister rxmm = xmm16; rxmm->is_valid(); rxmm = rxmm->successor()) { > __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); > } Will do.. ("loop" erm.. wow.. "duh, this isn't assembler!") Thanks!! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dholmes at openjdk.org Thu Nov 17 01:40:17 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 01:40:17 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <PKnmrwRKsZD5sucQrFi1FPDaW36RR65IaNbuZmIEOmk=.64471982-a650-42f0-9735-88db6604c582@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Still think this needs a CSR request - sorry. If it warrants a RN it warrants a CSR request. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From dholmes at openjdk.org Thu Nov 17 01:48:21 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 01:48:21 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> Message-ID: <8tD0490Li8PSucFYka-cFON3vgNAK-Ti79yN9K3oKNo=.59d97ce0-227a-467d-8a7c-54d00b31aa7f@github.com> On Tue, 15 Nov 2022 18:52:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. Seems okay to me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11033 From dholmes at openjdk.org Thu Nov 17 02:09:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:09:24 GMT Subject: RFR: 8296776: Stop using mtNone as marker for CHeap allocations in GrowableArray [v8] In-Reply-To: <4XP5mF36J2i2ftjKr21gkDhQhJXtNdpID7wcLVd6IP8=.6fbe9884-cae6-4c09-8aa8-e76162e2133a@github.com> References: <YrdcPv9VNBUbJX7v-JSRgsccmLmLije1sea2Tnf8wBo=.b8e2f370-2cbd-4075-a548-751ec019f46c@github.com> <yjTlwjvdvgoyI6ScfYzSvhK54QiXMAAnT0gO0nroB3o=.ca108a44-c1a3-449d-8f95-259a1d50131b@github.com> <4XP5mF36J2i2ftjKr21gkDhQhJXtNdpID7wcLVd6IP8=.6fbe9884-cae6-4c09-8aa8-e76162e2133a@github.com> Message-ID: <NZKdkdW2lOwEfxQALaizJbE1mJfgNqlvebwTElfpWFI=.b92ae514-694f-4f75-9670-24198b55e35c@github.com> On Wed, 16 Nov 2022 09:15:06 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Spelling > > Thanks for the reviews! Sorry @stefank this one slipped through the cracks. This simplified version looks better to me. Sorry for not responding to your query sooner. ------------- PR: https://git.openjdk.org/jdk/pull/11086 From coleenp at openjdk.org Thu Nov 17 02:14:18 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 17 Nov 2022 02:14:18 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <XqZ4X7HNoyS_g4akUJUlN3d-G0UTCl8E1rkphT4zweA=.fdbf93b5-de92-4a6a-96bb-0908063c6eee@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file https://bugs.openjdk.org/browse/JDK-8297169 Probably needs better wording. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From dholmes at openjdk.org Thu Nov 17 02:19:19 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:19:19 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> Message-ID: <iEUHXJRgtb4XUbnFjt78pilRbh_UDIxMAwTcRBflVcg=.8d7f582f-f428-4793-9eff-b2152fa3d55c@github.com> On Mon, 14 Nov 2022 07:32:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. >> >> We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David I don't disagree that C-heap trimming is useful even if only Linux does it. My objection is to defining a platform-agnostic API when only Linux does it. ------------- PR: https://git.openjdk.org/jdk/pull/11089 From dholmes at openjdk.org Thu Nov 17 02:32:52 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:32:52 GMT Subject: RFR: 8296926: Sort include lines of files in the include/ directory [v4] In-Reply-To: <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> References: <tl8LtRfG6_BUCdNMucC4JCSbjv_-yFP46CXbqYnvNxs=.cba35dd6-f1ee-41bf-aa87-9381cc064bf6@github.com> <Ui-S20RkgTtaXg8YyowdihZ7MAxV8yTEidms18YpKUQ=.adc86da6-1326-448d-ad8e-73be100a6d14@github.com> Message-ID: <JkY5Uxb3Mwok0Dtm_J9T4ULcVGj8MkaAWzbkDhvf3aE=.aeac2778-fe5c-497e-95bf-b8ef90549154@github.com> On Wed, 16 Nov 2022 11:05:59 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpo t. >> >> This RFE splits out the 'include/' changes from #11108 / JDK-8296886, so that those changes can be discussed separately. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Remove include/ from test/hotspot files > - Merge remote-tracking branch 'upstream/master' into 8296926_proper_include_lines_for_include_dir_files > - Revert make file changes > - Remove include/ from includes > - 8296926: Use proper include lines for files in include/ Sorry I didn't get a chance to look at this yesterday. A belated "thumbs up". ------------- PR: https://git.openjdk.org/jdk/pull/11133 From dholmes at openjdk.org Thu Nov 17 02:38:55 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:38:55 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v3] In-Reply-To: <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> Message-ID: <bBXKxyJ1c2PB_VWazKcU6lAYk3HxSKvylQh77DBOPYE=.0dc05d6f-7bdd-4c74-be09-19a9be60ddc4@github.com> On Mon, 14 Nov 2022 09:26:47 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. >> >> The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : >> >> >> #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(0) : NativeCallStack::empty_stack()) >> #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ >> NativeCallStack(1) : NativeCallStack::empty_stack()) >> >> >> and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: >> >> >> void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); >> >> >> In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). >> >> However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): >> >> >> 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # Load tracking level >> cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> >> cb9a7e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9a80: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> >> # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: >> cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> >> cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 >> cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 >> cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) >> cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): >> cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> ... >> cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. >> >> --------------------- >> >> The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. >> >> This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. >> >> In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: >> >> >> 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: >> ... >> # load tracking level >> cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> >> cb990e: 8b 03 mov (%rbx),%eax >> # detail (3) tracking? >> cb9910: 83 f8 03 cmp $0x3,%eax >> # yes: go and collect callstack >> cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> >> # no: nothing more to do ... >> ... >> # do the actual malloc: >> cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> >> ... >> # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: >> cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx >> .. >> cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> >> >> >> There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. >> >> -------------- >> >> Results: >> >> When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Ioi and David Sorry for the delay - been swamped the past couple of days. Looks fine. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11040 From dholmes at openjdk.org Thu Nov 17 02:46:33 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:46:33 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v4] In-Reply-To: <bPB4PgPSiG_14IpRKCrO3QnWrIdztrOgxoycffG-9wg=.063eb16c-f6f1-4528-a43e-e17364795148@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <bPB4PgPSiG_14IpRKCrO3QnWrIdztrOgxoycffG-9wg=.063eb16c-f6f1-4528-a43e-e17364795148@github.com> Message-ID: <7aDbNUCzYhuMPwFg7DXCU8AusTNMtdyGEkWn9oVo9SA=.2ab89784-19ea-4bc8-8927-11a36361e43d@github.com> On Wed, 16 Nov 2022 08:50:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. >> >> The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Remove trailing whitespace > > Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> > - blessed modifiers > > Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> One nit, otherwise seems okay. Thanks. test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 53: > 51: * @return > 52: */ > 53: static public File openHsErrFileFromOutput(OutputAnalyzer output) { "Blessed modifier order" should be used on each function. public static ... ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11122 From dholmes at openjdk.org Thu Nov 17 02:46:33 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:46:33 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v4] In-Reply-To: <W2dP9QHYcpLsLoHyl6hygw6wDOxtbW-otb1XWPSv0QU=.7c72ac80-dc9f-4a75-b14a-5cd9a870c988@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> <W2dP9QHYcpLsLoHyl6hygw6wDOxtbW-otb1XWPSv0QU=.7c72ac80-dc9f-4a75-b14a-5cd9a870c988@github.com> Message-ID: <csZMZRwEFfKbHZKAXO5brjxEz-ahzoRZ4rKvsUt0ML4=.f8d4ea22-3c48-4bdd-b6c4-b3e781aaf221@github.com> On Mon, 14 Nov 2022 06:37:38 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> test/hotspot/jtreg/runtime/ErrorHandling/HsErrFileUtils.java line 15: >> >>> 13: >>> 14: // extract hs-err file >>> 15: String hs_err_file = output.firstMatch("# *(\\S*hs_err_pid\\d+\\.log)", 1); >> >> I'm not sure this is going to be useful in the way you are trying to use it. This will show the original path to the hs_err file at the time it is created. But jtreg can move things around in the final test result output and place the hs_err file somewhere else. > > We use this pattern in a number of places, e.g. in BadNativeStackInErrorHandlingTest and SafeFetchInErrorHandlingTest.java. Seems to work ok so far. hmm - okay. I'm surprised. ------------- PR: https://git.openjdk.org/jdk/pull/11122 From dholmes at openjdk.org Thu Nov 17 02:56:23 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 02:56:23 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> Message-ID: <ixTjmvYdlqKWqCSYok0Dx6OaKhjGWeAClyhDl1V9q9E=.8ca7adcf-8c62-4d83-a956-627c6d65b233@github.com> On Mon, 14 Nov 2022 12:43:41 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We noticed that NMT tests on our slower PPC machines started failing. > > The reason is that NMT detail reports have become 2-5x slower. This is caused by us now parsing the dwarf debug information to extract source information for each PC in each call stack. That is nice but costly. > > The slowdown is not limited to PPC, it affects all Elf platforms. On my Linux x64 box, runtime/NMT/VirtualAllocCommitMerge.java increased from 20 to 90 seconds. > > --- > > This patch simply removes source info from NMT call stacks. They are not that important for pinpointing leaks and such. I considered more involved solutions, like making them optional via an argument to the NMT report command, but decided against it. The added benefit would be small, not worth much complexity. > > With this patch, on my box with -conc 4 all NMT together are about 2.5 x faster (2m56 -> 1m09). Sorry have been swamped past couple of days. >From the style guide: > As a general rule don't add bug numbers to comments (they would soon overwhelm the code). But if the bug report contains significant information that can't reasonably be added as a comment, then refer to the bug report. I would probably not have added it in this case but its a judgment call. ------------- PR: https://git.openjdk.org/jdk/pull/11135 From duke at openjdk.org Thu Nov 17 03:23:49 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 03:23:49 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: vzeroall, no spill, reg re-map ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/dbdfd1dc..56aed9b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=19-20 Stats: 182 lines in 3 files changed: 15 ins; 44 del; 123 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 17 03:23:49 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 03:23:49 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> Message-ID: <VZS2fFxgYgi_z45Infcicw9wcN09O9av5TOjIiuVcl8=.a67c2dfa-6eb0-4073-9d83-258376cc4169@github.com> On Thu, 17 Nov 2022 03:19:15 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > vzeroall, no spill, reg re-map @iwanowww Another round ready your way :) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 17 03:23:51 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 03:23:51 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <Rwzj8FudpIjNzfdKft459T7lSt7O1IFD5FrX2o1fxpg=.c27afeef-dbaa-42d5-b0d8-42d0ce8f9247@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <8UaOUsGNlGnh87OM1y8tMC6pVfeFn0nYUHyhvT7J-ss=.4e581e46-4759-4f69-8a6a-6383bc6f16de@github.com> <Rwzj8FudpIjNzfdKft459T7lSt7O1IFD5FrX2o1fxpg=.c27afeef-dbaa-42d5-b0d8-42d0ce8f9247@github.com> Message-ID: <42Jq_3oM24kB-AcDEzAdsHIQwcOZX0y9_boTctpLUa4=.6076a16c-9cc6-4755-9eaf-1f6ca4c1fb85@github.com> On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: >> >>> 754: >>> 755: // Store R^8-R for later use >>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); >> >> Could these vector spills be eliminated? I counted 8 spare zmm registers available across the vector loop (xmm7-xmm12, xmm30, xmm31). >> >> And here's what is explicitly used in `process256Loop`: >> >> D0 D1 = xmm2-xmm3 >> B0 B1 B2 B3 B4 B5 = xmm19-xmm24 >> TMP = xmm6 >> A0 A1 A2 A3 A4 A5 = xmm13-xmm18 >> R0 R1 R2 R1P R2P = xmm25-xmm29 >> T0 T1 T2 T3 T4 T5 = xmm0-xmm5 > > Interesting!! Let me try that! Done! PS: This find really was great! PPS: I also reordered the map alphabetically and counted in-order... it was just really bugging me!! ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 17 03:23:52 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 03:23:52 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20] In-Reply-To: <8Wbchzsuzf9Vx7btDUoVEevuDmCdRCuKMlZZjBEyXmg=.8c6ef831-9f5c-4b32-b007-c0bca9161c9f@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <vevFyyVp7yfNgrsssGqolR-fqn6eVcc4y-O1_vOyIJI=.9d7af407-5d4b-4785-ae11-3c234111d381@github.com> <y8FF70sBOLOjFDFIWvuoM2-AzKHFFoNOmfY4ezyC0EY=.4f5b97fd-8030-4c81-87f4-a698dfd8813d@github.com> <9sCYEHe6Q8oPYHxAOWE4DjPGFalX0TIpaXkyWaGSyGk=.eb1e9867-8b83-4fed-a809-8c871cda8a23@github.com> <kyCbBQ9ukAXadLWfvdsrM1-u40o8voaN9PM7sUY8YGs=.ffed6f74-45c0-4fd0-bf9f-967a6f40f6e9@github.com> <2mAogemIauwWUYmfUtLgDNxtkskab6gjPrcFIQghuYk=.3f88c715-2d68-4995-9eaa-e989b6b5be8f@github.com> <8Wbchzsuzf9Vx7btDUoVEevuDmCdRCuKMlZZjBEyXmg=.8c6ef831-9f5c-4b32-b007-c0bca9161c9f@github.com> Message-ID: <Z4vZFqS5Hb7RXR2BdXRDW2IY4dsnyqS_YBS_08mH3zY=.c9d7a361-b0a4-4438-a4fc-56677236a300@github.com> On Wed, 16 Nov 2022 23:41:32 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Yes, please. And for the upper half of register file, just code it as a loop over register range: >> >> for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { >> XMMRegister rxmm = as_XMMRegister(rxmm_num); >> __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); >> } >> >> or even >> >> // Zeroes zmm16-zmm31. >> for (XMMRegister rxmm = xmm16; rxmm->is_valid(); rxmm = rxmm->successor()) { >> __ vpxorq(rxmm, rxmm, rxmm, Assembler::AVX_512bit); >> } > > Will do.. ("loop" erm.. wow.. "duh, this isn't assembler!") Thanks!! done (Note: disassembler proof for vzeroall encoding 0x7fffed0022f8: vzeroall 0x7fffed0022fb: vpxorq zmm16,zmm16,zmm16 0x7fffed002301: vpxorq zmm17,zmm17,zmm17 0x7fffed002307: vpxorq zmm18,zmm18,zmm18 0x7fffed00230d: vpxorq zmm19,zmm19,zmm19 0x7fffed002313: vpxorq zmm20,zmm20,zmm20 0x7fffed002319: vpxorq zmm21,zmm21,zmm21 0x7fffed00231f: vpxorq zmm22,zmm22,zmm22 0x7fffed002325: vpxorq zmm23,zmm23,zmm23 0x7fffed00232b: vpxorq zmm24,zmm24,zmm24 0x7fffed002331: vpxorq zmm25,zmm25,zmm25 0x7fffed002337: vpxorq zmm26,zmm26,zmm26 0x7fffed00233d: vpxorq zmm27,zmm27,zmm27 0x7fffed002343: vpxorq zmm28,zmm28,zmm28 0x7fffed002349: vpxorq zmm29,zmm29,zmm29 0x7fffed00234f: vpxorq zmm30,zmm30,zmm30 0x7fffed002355: vpxorq zmm31,zmm31,zmm31 0x7fffed00235b: cmp ebx,0x10 0x7fffed00235e: jl 0x7fffed0023e6 ) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From never at openjdk.org Thu Nov 17 05:18:13 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 17 Nov 2022 05:18:13 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal [v2] In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> Message-ID: <m315McgLX6HHtkhTbWmCtOxJyrXpzuAb9izn_M1w19M=.91604f43-79aa-4e6e-ae86-6718c088dc3a@github.com> > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8218885: Restore pop_frame and force_early_return functionality for Graal ------------- Changes: https://git.openjdk.org/jdk/pull/5625/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=5625&range=01 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/5625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/5625/head:pull/5625 PR: https://git.openjdk.org/jdk/pull/5625 From never at openjdk.org Thu Nov 17 05:21:32 2022 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 17 Nov 2022 05:21:32 GMT Subject: Integrated: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> References: <x8a8D_mI2_R0T5iXOYbTPNWSKp3-8Qtj0QABQY_F3sM=.a1780863-066e-4c69-9864-0a5ba7b48255@github.com> Message-ID: <apkAq1Wbq3xRKMXAM6q1dntbMDwVGo8eTfBAq8aLVz4=.2d0c397a-9b7c-4516-bbc4-df3ac5f007cf@github.com> On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez <never at openjdk.org> wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. This pull request has now been integrated. Changeset: d61720a4 Author: Tom Rodriguez <never at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d61720a4dc1b3a9c6f7c5e6a2b68fa2b7735d545 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod 8218885: Restore pop_frame and force_early_return functionality for Graal Reviewed-by: kvn, dlong, sspitsyn, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/5625 From kbarrett at openjdk.org Thu Nov 17 06:07:33 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 17 Nov 2022 06:07:33 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v6] In-Reply-To: <cthdqge57PbbrQVd46zbRS9fhxqeHV_vM8GGEsmlrYA=.ed878d6f-0df1-40cf-b24c-90b5d8d305c7@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <RRATTtLqV_g7wnPOn4kbKqM63GPcdtg-2RyRtUSRqgg=.0d060a92-420a-4c43-bb11-addc0cb5ac4c@github.com> <wpIZeGRfbop46i3T4kS9jRfkIiNuXFKvo17ymbsQPEw=.573524bb-48db-4329-9571-10480ce8ee8d@github.com> <fWo7Fw-iffnP6tHcRUeRCy5Q92_nN2k5qG-B5zTmleU=.72f72ac3-f7ae-4c33-a7b2-04043e6952df@github.com> <2Ja9uaGY95zatV-viOoTkNbakqLkWuCDn761HztySZU=.f42b4b8a-2be1-41b1-98c0-12f953e6a88b@github.com> <HyLZBgVxonD9zj8kI96k0aZnWSO9Yk_-cxmAVuW0ZNM=.04766c2b-469e-424d-af7d-d2c428181bc3@github.com> <cthdqge57PbbrQVd46zbRS9fhxqeHV_vM8GGEsmlrYA=.ed878d6f-0df1-40cf-b24c-90b5d8d305c7@github.com> Message-ID: <uBl5h_vs00rdTgyY-eypzq1vVfebRDbYAtqKQlTD5rQ=.2dfbc551-f74c-416e-bc1f-f6158a4c0fbc@github.com> On Wed, 16 Nov 2022 09:06:22 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Updated to use the result from `os::snprtinf` in the new commit. > >> A result of -1 only occurs for an encoding error. An encoding error is only possible with multi-byte / wide characters. (See the definition of "encoding error" in C99 7.19.3/14.) We don't use those, so there won't be any encoding errors, so our uses of snprintf never return -1. > > Hi @kimbarrett, > > I am not sure this was true. E.g. https://stackoverflow.com/questions/65334245/what-is-an-encoding-error-for-sprintf-that-should-return-1 cites some cases where snprintf returns -1 that have nothing to do with multibyte strings. Also, size=0 would return -1 according to SUSv2. > > Note glibc differs and returns the number of chars it *would* have printed. Which is also dangerous in a different way. If you use that number to update the position, the position is not limited to buffer boundaries. So, I think the result of os::snprintf should not be used to update buffer position, at least not without checking. SUSv2 is *ancient*. That got fixed in SUSv3 / POSIX.1-2001. From the Linux man page for snprintf: "Concerning the return value of snprintf(), SUSv2 and C99 contradict each other: when snprintf() is called with size=0 then SUSv2 stipulates an unspecified return value less than 1, while C99 allows str to be NULL in this case, and gives the return value (as always) as the number of characters that would have been written in case the output string has been large enough. POSIX.1-2001 and later align their specification of snprintf() with C99." For many/most places where we might use the return value, an assert of no truncation would be appropriate. (Esp. likely in places where sprintf was being used correctly.) Though there are probably places where something more is needed. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Thu Nov 17 06:29:34 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 06:29:34 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v5] In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <BAOQ6D96JpcvMDUAKJyzg2WM-6tkZBWuzBCIJKM7WSo=.6ea60134-72bf-4e6f-af5a-50447d43b16a@github.com> > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: bless modifiers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11122/files - new: https://git.openjdk.org/jdk/pull/11122/files/a72d834a..ba29b09f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11122&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11122.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11122/head:pull/11122 PR: https://git.openjdk.org/jdk/pull/11122 From xlinzheng at openjdk.org Thu Nov 17 06:38:56 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 17 Nov 2022 06:38:56 GMT Subject: RFR: 8296975: RISC-V: Enable UseRVA20U64 profile by default [v3] In-Reply-To: <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> <m7r6u5I8dxWkuJhDjQBNDYoX8f5lRWcRMeA98oLkfiI=.d3b7ab8b-3c4e-4a04-8417-58ee20d77172@github.com> Message-ID: <l39Ms1Uc-Jxts38siNz4fY5Lmp3u-U5Ecq8L6pFeS2w=.69a1bff9-0c81-42c7-aa24-5bb3f1c38b75@github.com> On Wed, 16 Nov 2022 05:21:52 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: >> The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. >> >> >>> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" >> bool UseRVA20U64 = true {ARCH product} {default} >> bool UseRVC = true {ARCH product} {default} >> openjdk version "20-internal" 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html >> [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > minor issue if users specify command line -XX:+UseRVA20U64 and RVC is not supported Receiving no further comments, I think I am okay as well to remain the current PR. @luhenry 's suggestion also provides good thoughts for me, to dispatch the handling logic to `UseRVA20U64` and `UseRVA22U64` themselves, when hardware does not support RVC. Thank you. And it appears to me that `UseRVA20U64` is your code, I guess you would like to see that it is a default true value. I think we can move forward with this, and I need certainly to keep an eye on if there are issues after this PR. Thank you all for the reviews and thoughts! ------------- PR: https://git.openjdk.org/jdk/pull/11155 From stuefe at openjdk.org Thu Nov 17 07:09:21 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 07:09:21 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> Message-ID: <YTKyx9Q919l2R_iwLBe7AiSkaBF1PO1QVNw7gRVHlGA=.d9b47154-e228-4ee6-8326-c2391514d49d@github.com> On Mon, 14 Nov 2022 02:49:01 GMT, David Holmes <dholmes at openjdk.org> wrote: >> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. >> >> The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. > > Did we not notice or was that exactly what was expected? When JDK-8065895 added that code it was known to generate SEGV _because_ it was outside the allowable range (otherwise there would have been no guarantee). Maybe the SI_KERNEL behaviour has changed since then? It doesn't seem an issue to change it to a low address (other than that doesn't work on AIX) but it seems odd to now consider it a bug - seems more like you now consider it too limited because the true address is not given in the sig info? Thanks @dholmes-ora, @turbanoff and @MBaesken ! I fixed the last remark from David and will integrate now. ------------- PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Thu Nov 17 07:11:10 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 07:11:10 GMT Subject: Integrated: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> Message-ID: <RpasByr6Ku206HhXV0Bk7IKb1zrhM5xE1Qpk-4jxjRQ=.0ceb81dc-17c3-4830-9df5-4546ae467ad9@github.com> On Sun, 13 Nov 2022 09:01:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. > > The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. This pull request has now been integrated. Changeset: b9d6e83e Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/b9d6e83e9bc8c37780f6af0f6135cda72ce3c1b2 Stats: 242 lines in 3 files changed: 238 ins; 1 del; 3 mod 8296906: VMError::controlled_crash crashes with wrong code and address Reviewed-by: dholmes, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/11122 From stuefe at openjdk.org Thu Nov 17 07:21:22 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 07:21:22 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <-ER02i5fNCaZX-v56giR7gCbuMkwwpdhg5LBLThVvds=.402ad601-58e3-4529-935a-151e5e76b2b3@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> <-ER02i5fNCaZX-v56giR7gCbuMkwwpdhg5LBLThVvds=.402ad601-58e3-4529-935a-151e5e76b2b3@github.com> Message-ID: <3fS41Mj_ubTpa_Yst_o20YWEPedypa7njvogcZf_zrk=.592f000f-bb5b-4954-af53-c2c65861f130@github.com> On Sun, 13 Nov 2022 22:55:52 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. >> >> Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. >> Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. >> >> I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden >> (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around >> to dealing with it. `::snprintf` in the list of candidates for >> https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been >> marked. But I don't see new bugs for the as-yet unmarked ones. >> >> As a general note, as a reviewer my preference is against non-trivial and >> persnickety code changes that are scattered all over the code base. For >> something like this I'd prefer multiple more bite-sized changes that were >> dealing with specific uses. I doubt everyone agrees with me though. > >> Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. > > Updated to use os::snprintf, except the files under adlc where the os::snptintf definition is not included. The use of snprintf could be cleaned up with existing code in the future. > >> >> Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. >> >> I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around to dealing with it. `::snprintf` in the list of candidates for https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been marked. But I don't see new bugs for the as-yet unmarked ones. >> >> As a general note, as a reviewer my preference is against non-trivial and persnickety code changes that are scattered all over the code base. For something like this I'd prefer multiple more bite-sized changes that were dealing with specific uses. I doubt everyone agrees with me though. > > It makes sense to me. I'd better focus on the building issue in this PR. > > Thank you for the review! Hi @XueleiFan and @kimbarrett , I agree to the change if we, as Kim suggested, add assertions for truncation where we use the return value of snprintf. I am not fully happy with the solution though, since printing is notoriously runtime-data dependent and runtime data can change in release builds. So we could have truncation at a customer that we never see in our tests with debug builds. But seeing that this patch takes so long now and blocks the MacOS build, I don't want to block it. We can improve the code in follow up RFEs. These places would be a lot simpler with stringStream. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Thu Nov 17 07:38:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 07:38:28 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> Message-ID: <gYvhtjL7cGOAcc7on8WCAgqChttNq-S9JIQRO-47TNE=.48436d46-4adf-4c7d-8387-29243862ff91@github.com> On Wed, 16 Nov 2022 17:25:38 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors >> - Feedback David > > src/hotspot/share/utilities/vmError.cpp line 1635: > >> 1633: // Any information (signal, context, siginfo etc) printed here should use the function >> 1634: // arguments, not the information stored in *this, since those describe the primary crash. >> 1635: char tmp[256]; // cannot use global scratch buffer > > Is there any problems with making this static? Given that we care about the stack depth when we have repeated crashes. I think not. At this point we are single threaded. I'll convert to static. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Thu Nov 17 07:46:22 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 07:46:22 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> Message-ID: <ssm98JXq9y_-z7Em7AbEfff4nhOSUCV48c5_JBEuuVs=.162e5afe-b0c3-42a1-aabe-4e162a0c3a9f@github.com> On Wed, 16 Nov 2022 17:32:33 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors >> - Feedback David > > src/hotspot/share/utilities/vmError.cpp line 1653: > >> 1651: st->print_cr("]"); >> 1652: #ifdef ASSERT >> 1653: if (ErrorLogSecondaryErrorDetails) { > >> But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. > > Should there be logic here that at least guarantee progress? > ```c++ > if (ErrorLogSecondaryErrorDetails && !_some_bool) { > _some_bool = true; > [...] > } > _some_bool = false; It would work horizontally too, though, since it would disable call stack printing for the next secondary error. I still think yours is a good idea. We can extend it to using a counter and only allow a small number (2-5) of secondary errors to get printed out with their call stacks. Because if we have more than, say, 5 recursive errors, chances are they all trip over the same thing anyway and more call stacks won't tell you anything new. And then, if we limit the number of these things, maybe we can afford to leave this feature on by default in debug builds. That addresses the concern @dholmes-ora voiced, about having to switch on the feature first. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From dholmes at openjdk.org Thu Nov 17 07:49:34 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 07:49:34 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address In-Reply-To: <YTKyx9Q919l2R_iwLBe7AiSkaBF1PO1QVNw7gRVHlGA=.d9b47154-e228-4ee6-8326-c2391514d49d@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <Kpe441SGn_42efi92acEGYSbLGbzLE2__jCqf9kpoUA=.893d018b-f758-4f0e-ac9c-449b968a413a@github.com> <YTKyx9Q919l2R_iwLBe7AiSkaBF1PO1QVNw7gRVHlGA=.d9b47154-e228-4ee6-8326-c2391514d49d@github.com> Message-ID: <MYJCFzWyHzIs-O6mmfR2cVX3MMc62WnPJATGL6wbvtU=.daa0e04d-189f-4060-9024-7df086cb9fdd@github.com> On Thu, 17 Nov 2022 07:07:16 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Did we not notice or was that exactly what was expected? When JDK-8065895 added that code it was known to generate SEGV _because_ it was outside the allowable range (otherwise there would have been no guarantee). Maybe the SI_KERNEL behaviour has changed since then? It doesn't seem an issue to change it to a low address (other than that doesn't work on AIX) but it seems odd to now consider it a bug - seems more like you now consider it too limited because the true address is not given in the sig info? > > Thanks @dholmes-ora, @turbanoff and @MBaesken ! > > I fixed the last remark from David and will integrate now. @tstuefe the test is failing in our CI on aarch64. I will file a bug ------------- PR: https://git.openjdk.org/jdk/pull/11122 From dholmes at openjdk.org Thu Nov 17 07:58:29 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 07:58:29 GMT Subject: RFR: JDK-8296906: VMError::controlled_crash crashes with wrong code and address [v5] In-Reply-To: <BAOQ6D96JpcvMDUAKJyzg2WM-6tkZBWuzBCIJKM7WSo=.6ea60134-72bf-4e6f-af5a-50447d43b16a@github.com> References: <mGuOyJRI3f0HyQ2pXqtze5n38PO0U-0am6MfjsHc2RQ=.e0b3ebbf-c2fd-4de6-bebe-316b53159c30@github.com> <BAOQ6D96JpcvMDUAKJyzg2WM-6tkZBWuzBCIJKM7WSo=.6ea60134-72bf-4e6f-af5a-50447d43b16a@github.com> Message-ID: <tQOwNJUwfNZrnbNTTrZY5T4s1ZkNPvcfhq2b-drE2ps=.c8e7729c-f78b-4671-b123-00772c3f3c56@github.com> On Thu, 17 Nov 2022 06:29:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> We have VMError::controlled_crash() in debug builds, whose job is to trigger clearly defined faults to test VM error reporting. VMError::controlled_crash(14) (the numbers don't mean anything and probably should be replaced with clear enums) is to crash with a SIGSEGV + SEGV_MAPERR mapping error at a well-known crash address. But this does not work on Linux, where it generates a SIGSEGV with SI_KERNEL instead. We never noticed since it had not been used in tests so far. >> >> The reason for SI_KERNEL was that the crash address we use (0xABC0000000000ABC) was outside the user-space address range on Linux. This patch redefines the crash address to a value that really generates a SIGSEGV + SEGV_MAPERR on all our platforms. That's one line; the rest is a new regression test that checks that signal info is printed correctly in hs-err files. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > bless modifiers Filed [JDK-8297184](https://bugs.openjdk.org/browse/JDK-8297184) ------------- PR: https://git.openjdk.org/jdk/pull/11122 From stefank at openjdk.org Thu Nov 17 08:16:28 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Nov 2022 08:16:28 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> Message-ID: <HAQlrE-kNn9bi_clIGTIEp059Ub1t_BUMzBg3dy6NuY=.91ee39cc-62d8-4fea-8f89-e953be9f2c99@github.com> On Wed, 16 Nov 2022 16:17:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Cleanups > - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes > - Various include order fixes I've updated the patch for this RFR with just sort-order changes and a blank line fixes around the include blocks. Everything else has been moved out of this PR. I hope this limited version of this cleanup isn't too controversial. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stefank at openjdk.org Thu Nov 17 08:16:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Nov 2022 08:16:30 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> Message-ID: <g1eRfN33PRGjljH1Hog1tXax1eVZp5kqbrl1nOiOiWc=.67c1484a-a92d-44f0-924d-7daf2c2a80db@github.com> On Fri, 11 Nov 2022 21:01:31 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Cleanups >> - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes >> - Various include order fixes > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 30: > >> 28: #include "compiler/compiler_globals.hpp" >> 29: #include "compiler/disassembler.hpp" >> 30: #include "crc32c.h" > > This ought to remain at the end, included using `CPU_HEADER_H("crc32c")`, but the file doesn't have the appropriate suffix. I think the file name should be changed so that can be done. This seems to be the only C++ file under cpu/ that doesn't have the appropriate suffix. There are similarly "misnamed" files under various os/ subdirs; I didn't look for any in os_cpu/. I'd like to defer that discussion to a separate RFE. > src/hotspot/os_cpu/bsd_zero/os_bsd_zero.cpp line 57: > >> 55: #if !defined(__APPLE__) && !defined(__NetBSD__) >> 56: #include <pthread.h> >> 57: # include <pthread_np.h> /* For pthread_attr_get_np */ > > Remove unneeded space. I reverted all such space cleanups, in the interest of getting the limited cleanup fixed. If we want to fix these spaces I'd like that to happen as a separate RFE. > src/hotspot/share/cds/classListParser.cpp line 42: > >> 40: #include "interpreter/bytecodeStream.hpp" >> 41: #include "interpreter/linkResolver.hpp" >> 42: #include "jimage.hpp" > > Sorting jimage.hpp with hotspot/share stuff seems weird, and is kind of hiding this external dependency. (It's coming from java.base.) We can hopefully discuss this as a separate RFE. > src/hotspot/share/prims/whitebox.cpp line 26: > >> 24: >> 25: #include "precompiled.hpp" >> 26: #include <new> > > Why was `<new>` removed? We tend to pull it in via memory/allocation.hpp. I can revert this, but I'm not sure it is important. There's no consistency in when we include `<new>` or not. > test/hotspot/gtest/jfr/test_adaptiveSampler.cpp line 48: > >> 46: #include "unittest.hpp" >> 47: >> 48: #include <cmath> > > Why is this after unittest.hpp? System includes go after HotSpot includes. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stefank at openjdk.org Thu Nov 17 08:16:30 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Nov 2022 08:16:30 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <MfGGgkYsvTjqm5ClboYxOSRR5ar9iNUGAzINXl3gkOQ=.67e650a1-31b5-4cb4-a5bb-e62389c8c4ba@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <2lOleILw53UHm1WCmH4QPBsjbisb1h6H7Gz9jJRtJ8A=.139da306-4220-438c-be4c-64be6dc14c5d@github.com> <MfGGgkYsvTjqm5ClboYxOSRR5ar9iNUGAzINXl3gkOQ=.67e650a1-31b5-4cb4-a5bb-e62389c8c4ba@github.com> Message-ID: <i33cy0VH1KgvNnU7xyk41MZyVweNzlYzOh5fDplaESI=.6fcf06b6-a459-49a9-901d-f88f907c2f4a@github.com> On Mon, 14 Nov 2022 11:39:10 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> src/hotspot/os/windows/jvm_windows.cpp line 27: >> >>> 25: #include "precompiled.hpp" >>> 26: #include "include/jvm.h" >>> 27: #include "os_windows.hpp" >> >> os_windows should be at the end, included using `OS_HEADER("os")`. > > But should we be directly including os_windows.hpp, rather than including os.hpp? Deferring those discussions to a potential separate RFE. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stuefe at openjdk.org Thu Nov 17 08:30:27 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 08:30:27 GMT Subject: RFR: JDK-8296931: NMT tests slowed down considerably by JDK-8242181 In-Reply-To: <ixTjmvYdlqKWqCSYok0Dx6OaKhjGWeAClyhDl1V9q9E=.8ca7adcf-8c62-4d83-a956-627c6d65b233@github.com> References: <UYwJT-bbfC3AmDSO7M2rT68qg3pT1PEK0tr90gQ9yY4=.f5ac68d4-380e-4383-b24d-4b98272dabd7@github.com> <ixTjmvYdlqKWqCSYok0Dx6OaKhjGWeAClyhDl1V9q9E=.8ca7adcf-8c62-4d83-a956-627c6d65b233@github.com> Message-ID: <OLxeqIl561Gxhn77WlMo3nJTKZA3YiQzpkJiSS6DUEQ=.b9d5b82f-47a9-4c33-bec4-16de93c28e2d@github.com> On Thu, 17 Nov 2022 02:54:21 GMT, David Holmes <dholmes at openjdk.org> wrote: > Sorry have been swamped past couple of days. > > From the style guide: > > > As a general rule don't add bug numbers to comments (they would soon overwhelm the code). But if the bug report contains significant information that can't reasonably be added as a comment, then refer to the bug report. > > I would probably not have added it in this case but its a judgment call. Okay, good to know. I'll remove it should I touch the code again. Not worth a separate RFE. ------------- PR: https://git.openjdk.org/jdk/pull/11135 From stuefe at openjdk.org Thu Nov 17 08:31:34 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 08:31:34 GMT Subject: RFR: JDK-8296437: NMT incurs costs if disabled [v3] In-Reply-To: <bBXKxyJ1c2PB_VWazKcU6lAYk3HxSKvylQh77DBOPYE=.0dc05d6f-7bdd-4c74-be09-19a9be60ddc4@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> <ydQdOtTlFJ407X_AGYWpPSksUJEjvs7B9zZxsOO7EKU=.5fd56bb1-dc5c-4b8f-8891-d44cfec095d4@github.com> <bBXKxyJ1c2PB_VWazKcU6lAYk3HxSKvylQh77DBOPYE=.0dc05d6f-7bdd-4c74-be09-19a9be60ddc4@github.com> Message-ID: <7NkGBzPE2xauOEd7eegNvD6RYY-acjlHqqtxTjieUms=.35b43fe1-cd29-4c4a-9784-cb46f798a36e@github.com> On Thu, 17 Nov 2022 02:35:20 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback Ioi and David > > Sorry for the delay - been swamped the past couple of days. > > Looks fine. Thanks. Thanks @dholmes-ora and @iklam ! ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stuefe at openjdk.org Thu Nov 17 08:35:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 08:35:07 GMT Subject: Integrated: JDK-8296437: NMT incurs costs if disabled In-Reply-To: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> References: <i7_jN-SiNi7dpkBsdk7mUzXLobpYpOryVVWol4rRd2A=.f2672500-4475-46ea-afc5-dbeb3a87a45f@github.com> Message-ID: <6hao2VaQ6T0n_NJgcWCUw7j165oCaT3rIdqRnpHQVvc=.09ac4aaa-5fb9-4a1e-b605-67de286b568e@github.com> On Tue, 8 Nov 2022 14:40:10 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > While investigating the performance of the os::malloc wrapper, I noticed that we spend a lot of cycles copying empty callstacks around, even if NMT is disabled. > > The CURRENT_PC and CALLER_PC macros are used to create `NativeCallStack` objects out of thin air : > > > #define CURRENT_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(0) : NativeCallStack::empty_stack()) > #define CALLER_PC ((MemTracker::tracking_level() == NMT_detail) ? \ > NativeCallStack(1) : NativeCallStack::empty_stack()) > > > and feed them to a callee routine, which usually has the argument defined via const reference, e.g. os::malloc: > > > void* os::malloc(size_t size, MEMFLAGS memflags, const NativeCallStack& stack); > > > In CURRENT|CALLER_PC, the left hand of the ':' operator handles the detail mode, when we actually do collect a stack. In that case, the stack sits on the thread stack as an automatic anonymous variable and is filled by the stack walker. The right-hand of ':' handles the case when we don't want a stack. In that case, the intent is to hand down the reference to a pre-created "empty stack" singleton (NativeCallStack::empty_stack()). > > However, that does not work as intended. The C++ compiler - at least gcc on linux - interprets these as copy-by-value and generates code that always laboriously copies the content of the empty stack singleton onto the thread stack. It uses four SSE instructions - two 16byte loads, and two 16byte moves (the NMT stacks are by default 4 frames, so 4 pointer-sized slots): > > > 0000000000cb9a60 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # Load tracking level > cb9a77: 48 8d 1d 02 35 78 00 lea 0x783502(%rip),%rbx # 143cf80 <_ZN10MemTracker15_tracking_levelE> > cb9a7e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9a80: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9a83: 0f 84 57 01 00 00 je cb9be0 <_ZN2os6mallocEm8MEMFLAGS+0x180> > # no: copy the content of NativeCallStack::_empty_stack to the local stack, in 16 byte intervals: > cb9a89: 48 8d 05 30 44 78 00 lea 0x784430(%rip),%rax # 143dec0 <_ZN15NativeCallStack12_empty_stackE> > cb9a90: f3 0f 6f 00 movdqu (%rax),%xmm0 > cb9a94: f3 0f 6f 48 10 movdqu 0x10(%rax),%xmm1 > cb9a99: 0f 11 45 c0 movups %xmm0,-0x40(%rbp) > cb9a9d: 0f 11 4d d0 movups %xmm1,-0x30(%rbp) > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX): > cb9b0f: 48 8d 4d c0 lea -0x40(%rbp),%rcx > ... > cb9b19: e8 f2 b7 f3 ff callq bf5310 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > This is completely unnecessary, since if NMT mode != detail, the stack is never used. This hits every call site where these macros are used, and we pay if NMT is disabled. > > --------------------- > > The patch changes the macros to avoid initialization of `NativeCallStack` if NMT is off or in summary mode only. > > This was a bit tricky to do, since I wanted the compiler to not do anything if NMT is disabled, and of course I did not want to change the semantics of CALLER|CURRENT_PC. > > In the end I settled for exchanging the explicit calls to `NativeCallStack::empty_stack()` to calls to the default constructor. I changed the default constructor to a no-op. So the NativeCallStack object is not initialized, the compiler optimizes the empty constructor call away. In NMT=off, we are done; in NMT=summary mode, we now just hand down the pointer to the uninitialized NativeCallStack to MallocTracker::record_malloc(), which will ignore it anyway: > > > 0000000000cb98f0 <_ZN2os6mallocEm8MEMFLAGS>: > ... > # load tracking level > cb9907: 48 8d 1d 72 46 78 00 lea 0x784672(%rip),%rbx # 143df80 <_ZN10MemTracker15_tracking_levelE> > cb990e: 8b 03 mov (%rbx),%eax > # detail (3) tracking? > cb9910: 83 f8 03 cmp $0x3,%eax > # yes: go and collect callstack > cb9913: 0f 84 37 01 00 00 je cb9a50 <_ZN2os6mallocEm8MEMFLAGS+0x160> > # no: nothing more to do ... > ... > # do the actual malloc: > cb9af8: e8 c3 40 5d ff callq 28dbc0 <malloc at plt> > ... > # call MallocTracker::record_malloc() and hand down pointer to NMT stack (4th argument->RCX). The stack remains uninitialized, that is fine, since the MallocTracker will ignore it anyway: > cb9987: 48 8d 4d c0 lea -0x40(%rbp),%rcx > .. > cb9991: e8 ba b8 f3 ff callq bf5250 <_ZN13MallocTracker13record_mallocEPvm8MEMFLAGSRK15NativeCallStack> > > > There were only two callers of the default constructor that used it, and I changed them to use `NativeCallStack ncs(NULL, 0);` which is functionally equivalent. > > -------------- > > Results: > > When profiling, I see os::malloc now needs less cycles, and the hotspot around the xmm instructions is not there anymore. This pull request has now been integrated. Changeset: 9f8b6d2a Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/9f8b6d2aa6733efb69d2d4f7e5f9e09dc5df9800 Stats: 30 lines in 3 files changed: 27 ins; 0 del; 3 mod 8296437: NMT incurs costs if disabled Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.org/jdk/pull/11040 From stuefe at openjdk.org Thu Nov 17 08:40:03 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 08:40:03 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> Message-ID: <7DLTFuOCmELnQZ8Ulm3dExGES_XIvUq8q69AhzGrxWQ=.ef61b5fa-ba3c-462f-8ec9-7141e722e9a8@github.com> On Wed, 16 Nov 2022 17:34:44 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors >> - Feedback David > > src/hotspot/share/utilities/vmError.cpp line 1641: > >> 1639: _current_step_info, id); >> 1640: if (os::exception_name(id, tmp, sizeof(tmp))) { >> 1641: st->print(", %s (0x%x) at pc=" PTR_FORMAT, tmp, id, p2i(pc)); > > Not really relevant for this PR. But do not like the inconsistency that the id is printed as hex here, decimal in os::print_siginfo for posix, and hex in os::print_siginfo for windows. I think the whole concept of VMError::id is not that good. It holds either SEH codes (windows), signal numbers or some enum we ourselves define. So it combines several orthogonal things, and the encoding is not clear. I have several half-done patches laying around cleaning it up, but never got around to finish them. I think some hierarchical structure would be better, e.g. errortype="crash|assert|oom", with auxilliary information depending on errortype attached. The encoding can be as dense as a 32-bit value, if one wanted. But that would also sort the printing problem, since printing depends on what the id is in the first place (e.g. signals are usually printed with d, but SEH codes as 32bit hex). ------------- PR: https://git.openjdk.org/jdk/pull/11118 From eosterlund at openjdk.org Thu Nov 17 09:24:04 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 17 Nov 2022 09:24:04 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Fix Richard comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11111/files - new: https://git.openjdk.org/jdk/pull/11111/files/b20563f5..3de25624 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=02-03 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From sspitsyn at openjdk.org Thu Nov 17 09:26:32 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 17 Nov 2022 09:26:32 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM Message-ID: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. Testing: New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` This test is failed without fix and passed with it. TBD: run all JVMTI and JDI test in mach5. ------------- Commit messages: - fixed traling whitespace in jvmtiExport.cpp - 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM Changes: https://git.openjdk.org/jdk/pull/11204/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11204&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296324 Stats: 182 lines in 3 files changed: 181 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11204.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11204/head:pull/11204 PR: https://git.openjdk.org/jdk/pull/11204 From eosterlund at openjdk.org Thu Nov 17 09:30:25 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 17 Nov 2022 09:30:25 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v3] In-Reply-To: <6p2iTiK-RvQtQUUvZHID1kpjZEB1wb72CHEi5X_-zuA=.c447ad16-8a1c-4af3-a062-0b1acbbcc1d0@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> <6p2iTiK-RvQtQUUvZHID1kpjZEB1wb72CHEi5X_-zuA=.c447ad16-8a1c-4af3-a062-0b1acbbcc1d0@github.com> Message-ID: <6mluvJmDKsrEmQ7eHAGIkJFkTeAtBugp2J0ZxG7bx_E=.d57b6f9f-e14c-490e-b455-f5eaa7c99da4@github.com> On Wed, 16 Nov 2022 15:47:37 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Indentation fix > > Hi @fisk, I've skimmed the changes. They look good to me. I do have a few comments/questions also. Thanks @reinrich for the review! I have pushed a comment for the continuation oops where they are handled as requested. > src/hotspot/share/gc/shared/barrierSetStackChunk.cpp line 68: > >> 66: >> 67: virtual void do_oop(oop* p) override { >> 68: if (UseCompressedOops) { > > Wouldn't it be better to hoist the check for `UseCompressedOops`? The compiler should be able to do that already. We devirtualize calls into oop closures, and the closure is stack allocated. So the compiler should be able to do that if it finds that it is a good idea. I'd prefer to leave that to the compiler. > src/hotspot/share/gc/shenandoah/shenandoahBarrierSetStackChunk.cpp line 30: > >> 28: >> 29: void ShenandoahBarrierSetStackChunk::encode_gc_mode(stackChunkOop chunk, OopIterator* oop_iterator) { >> 30: // Nothing to do > > Shenandoah allows `UseCompressedOops` enabled, doesn't it? Isn't it necessary then to do the encoding as in the super class? No we don't convert the oops for Shenandoah. Instead, Shenandoah's closures know how to deal with both oop* and narrowOop* on the heap, and will get passed the appropriate type of pointer. So it doesn't use the bitmap. I have tested that it works with Shenandoah as well. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From alanb at openjdk.org Thu Nov 17 10:17:34 2022 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 17 Nov 2022 10:17:34 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> Message-ID: <h6W7w0qfWR0yfA7kNKxpNT84qnGWNdjUTYyLJjA1i4A=.e6f9381f-804f-4df6-a54f-e3abf5e56184@github.com> On Wed, 16 Nov 2022 16:55:24 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Javadoc changes. > - ProblemList.txt cleanup src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java line 225: > 223: * </ul> > 224: * > 225: * The <i>descendants</i> of a task scope that are child task scopes that it is a parent There is a typo in the update (my fault). Suggestion: * The <i>descendants</i> of a task scope are the child task scopes that it is a parent ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aboldtch at openjdk.org Thu Nov 17 10:29:34 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 17 Nov 2022 10:29:34 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability [v2] In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <qZ1KS_MbDBbazQdi8qQNeaFqgzCFGcNEGH5wlRvYFZk=.e3900cf8-1acb-41bb-9126-ede3954cdb10@github.com> > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Follow HotSpot code style: no implicit boolean - Respect 100 character line - Revert extended test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11018/files - new: https://git.openjdk.org/jdk/pull/11018/files/944ce8d1..b483c21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11018&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11018&range=00-01 Stats: 31 lines in 2 files changed: 7 ins; 10 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/11018.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11018/head:pull/11018 PR: https://git.openjdk.org/jdk/pull/11018 From rrich at openjdk.org Thu Nov 17 11:20:59 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Nov 2022 11:20:59 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v3] In-Reply-To: <6mluvJmDKsrEmQ7eHAGIkJFkTeAtBugp2J0ZxG7bx_E=.d57b6f9f-e14c-490e-b455-f5eaa7c99da4@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> <6p2iTiK-RvQtQUUvZHID1kpjZEB1wb72CHEi5X_-zuA=.c447ad16-8a1c-4af3-a062-0b1acbbcc1d0@github.com> <6mluvJmDKsrEmQ7eHAGIkJFkTeAtBugp2J0ZxG7bx_E=.d57b6f9f-e14c-490e-b455-f5eaa7c99da4@github.com> Message-ID: <ufeb-OJtXTi1N5taVYZWVcNNLCZhZpQ3AK5U7iijDxI=.8eef7cb0-fcf0-4b46-9894-b78f4f8ad543@github.com> On Thu, 17 Nov 2022 09:23:48 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> src/hotspot/share/gc/shared/barrierSetStackChunk.cpp line 68: >> >>> 66: >>> 67: virtual void do_oop(oop* p) override { >>> 68: if (UseCompressedOops) { >> >> Wouldn't it be better to hoist the check for `UseCompressedOops`? > > The compiler should be able to do that already. We devirtualize calls into oop closures, and the closure is stack allocated. So the compiler should be able to do that if it finds that it is a good idea. I'd prefer to leave that to the compiler. `CompressOopsOopClosure::do_oop()` and `FrameOopIterator::oops_do()` are defined in different compilation units. So calls to `do_oop()` cannot be devirtualized or am I missing something? Mistaken or not, I'm ok with this version. >> src/hotspot/share/gc/shenandoah/shenandoahBarrierSetStackChunk.cpp line 30: >> >>> 28: >>> 29: void ShenandoahBarrierSetStackChunk::encode_gc_mode(stackChunkOop chunk, OopIterator* oop_iterator) { >>> 30: // Nothing to do >> >> Shenandoah allows `UseCompressedOops` enabled, doesn't it? Isn't it necessary then to do the encoding as in the super class? > > No we don't convert the oops for Shenandoah. Instead, Shenandoah's closures know how to deal with both oop* and narrowOop* on the heap, and will get passed the appropriate type of pointer. So it doesn't use the bitmap. I have tested that it works with Shenandoah as well. Interesting and good to know. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From rrich at openjdk.org Thu Nov 17 11:26:22 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Nov 2022 11:26:22 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> Message-ID: <YO7GZySQe64k9iEGflP-dVV577ccZyUF1rfVbYvMvcE=.85793d31-b324-4231-b6dd-c1776766f0c2@github.com> On Thu, 17 Nov 2022 09:24:04 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix Richard comments Not an expert of every aspect but the changes look good to me. Thanks, Richard. Marked as reviewed by rrich (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11111 From coleenp at openjdk.org Thu Nov 17 11:59:21 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 17 Nov 2022 11:59:21 GMT Subject: RFR: 8296492: Remove ObjectLocker in JVMTI get_subgroups call [v6] In-Reply-To: <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> <2y4cu8hdnxpyMRtiBhBZ86E3JBBK4JUbH6oc_BI3XDY=.6be9e293-900a-4b7f-a1f3-607a257c8b2a@github.com> Message-ID: <XxVGo70MHWPqRwuaDD1Gz3LGQeY_EVAhbsJ3s2eWBOU=.f62a7e64-33fa-4662-aa86-8771cdeb26d6@github.com> On Tue, 15 Nov 2022 18:52:37 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. >> Tested with tier 1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. Thanks for reviewing, Alan and David. ------------- PR: https://git.openjdk.org/jdk/pull/11033 From coleenp at openjdk.org Thu Nov 17 12:01:09 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 17 Nov 2022 12:01:09 GMT Subject: Integrated: 8296492: Remove ObjectLocker in JVMTI get_subgroups call In-Reply-To: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> References: <XYC_kEDCR3MnepMtTFrMYRmKKfhKzjg0NUQ2qjxVThQ=.88f82873-ed85-4e3c-8d90-3b7d3e94c248@github.com> Message-ID: <eSvlNS9ruIPZD45pFackU0zQjlZKPX24vnasBWvozyA=.ff0163e5-84f6-46a9-8e2e-0409e6a1769f@github.com> On Tue, 8 Nov 2022 00:58:44 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > The JVM code took a ThreadGroup lock before poking into ThreadGroup fields. Call a method in the ThreadGroup to call the synchronized method instead. > Tested with tier 1-4. This pull request has now been integrated. Changeset: d8c809b1 Author: Coleen Phillimore <coleenp at openjdk.org> URL: https://git.openjdk.org/jdk/commit/d8c809b196e98bbf22849ec06c6ee337005670e8 Stats: 139 lines in 7 files changed: 39 ins; 83 del; 17 mod 8296492: Remove ObjectLocker in JVMTI get_subgroups call Reviewed-by: dholmes, alanb, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/11033 From eosterlund at openjdk.org Thu Nov 17 12:10:58 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 17 Nov 2022 12:10:58 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <YO7GZySQe64k9iEGflP-dVV577ccZyUF1rfVbYvMvcE=.85793d31-b324-4231-b6dd-c1776766f0c2@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> <YO7GZySQe64k9iEGflP-dVV577ccZyUF1rfVbYvMvcE=.85793d31-b324-4231-b6dd-c1776766f0c2@github.com> Message-ID: <k1fieSQKVJIFjKt69KxLdkZqTWMVWjEYgJ5cQoyXUQk=.e6b0eccc-db92-4269-bd88-3a681aa98ad8@github.com> On Thu, 17 Nov 2022 11:23:07 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Richard comments > > Marked as reviewed by rrich (Reviewer). Thanks for the review, @reinrich! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Thu Nov 17 12:10:59 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 17 Nov 2022 12:10:59 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v3] In-Reply-To: <ufeb-OJtXTi1N5taVYZWVcNNLCZhZpQ3AK5U7iijDxI=.8eef7cb0-fcf0-4b46-9894-b78f4f8ad543@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <tnSpXc7Z_RBHBNFrP-r-Fks1hWzplEn_OIwCwk5Vwo4=.94fb66f1-e927-4b73-b6e2-b8ecc01e903b@github.com> <6p2iTiK-RvQtQUUvZHID1kpjZEB1wb72CHEi5X_-zuA=.c447ad16-8a1c-4af3-a062-0b1acbbcc1d0@github.com> <6mluvJmDKsrEmQ7eHAGIkJFkTeAtBugp2J0ZxG7bx_E=.d57b6f9f-e14c-490e-b455-f5eaa7c99da4@github.com> <ufeb-OJtXTi1N5taVYZWVcNNLCZhZpQ3AK5U7iijDxI=.8eef7cb0-fcf0-4b46-9894-b78f4f8ad543@github.com> Message-ID: <inQoSX7uD99FZYWuanNeSgoyoWEkxPXDBc01k2Olb8o=.aba8b68c-b668-44fb-83f1-a9c909555864@github.com> On Thu, 17 Nov 2022 11:16:52 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> The compiler should be able to do that already. We devirtualize calls into oop closures, and the closure is stack allocated. So the compiler should be able to do that if it finds that it is a good idea. I'd prefer to leave that to the compiler. > > `CompressOopsOopClosure::do_oop()` and `FrameOopIterator::oops_do()` are defined in different compilation units. So calls to `do_oop()` cannot be devirtualized or am I missing something? > Mistaken or not, I'm ok with this version. Sorry, my bad. You are right - it can't devirtualize. Anyway, I'd like to keep it the way it is as I don't think it's worth optimizing this. ------------- PR: https://git.openjdk.org/jdk/pull/11111 From chagedorn at openjdk.org Thu Nov 17 12:23:07 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Nov 2022 12:23:07 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <rQ4K51zXOhklrxsshntEEoRZ07QKnZTYqbgZCb-Et1s=.cd281655-8800-49f1-aa17-c9a28820d244@github.com> On Thu, 17 Nov 2022 11:57:02 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias Removal looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 12:23:06 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 12:23:06 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag Message-ID: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. Thanks, Tobias ------------- Commit messages: - Missed do_aliasing() check - Added sanity assert and cleanup - 8297201: Obsolete AliasLevel flag Changes: https://git.openjdk.org/jdk/pull/11207/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11207&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297201 Stats: 75 lines in 10 files changed: 2 ins; 55 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/11207.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11207/head:pull/11207 PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 12:23:08 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 12:23:08 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <j9AKI5b8Ci5UDuUcIkm37d_K2LmC118in5oLN3Odvqs=.80f1bd65-87de-4877-b93c-aa16f43bf41c@github.com> On Thu, 17 Nov 2022 11:57:02 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias Thanks, Christian! ------------- PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 12:37:13 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 12:37:13 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed checks from flatten_alias_type method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11207/files - new: https://git.openjdk.org/jdk/pull/11207/files/7bb62ef6..68719301 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11207&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11207&range=00-01 Stats: 8 lines in 1 file changed: 1 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11207.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11207/head:pull/11207 PR: https://git.openjdk.org/jdk/pull/11207 From tholenstein at openjdk.org Thu Nov 17 12:37:14 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 17 Nov 2022 12:37:14 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <HwAp73l9b7ZEOJx-T_gn3k5HetMdzE3XOMZAhzbeo1M=.680c85e1-b4e6-40be-a20e-9558d1796596@github.com> On Thu, 17 Nov 2022 11:57:02 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias I think `do_aliasing()` is always true in `*Compile::flatten_alias_type( const TypePtr *tj )` because `flatten_alias_type` is only called in the `Compile::find_alias_type(..)` function after we checked that `do_aliasing()` is true. ------------- PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 12:37:15 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 12:37:15 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <6TgBCCBXyeudu9f6zxv8f6Oo9zRUazOh1oTb6DDUV9I=.78bf6b5c-9835-4278-b844-5d2c66cd60f4@github.com> On Thu, 17 Nov 2022 11:57:02 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias Good catch! I updated the fix accordingly. ------------- PR: https://git.openjdk.org/jdk/pull/11207 From chagedorn at openjdk.org Thu Nov 17 12:44:50 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Nov 2022 12:44:50 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <0jlYehrVhO2Cgh0ENsuogDdKYO2rvoTnBSxY3brrRDM=.fef87a61-1de0-4840-a361-dafd5daa3906@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11207 From dholmes at openjdk.org Thu Nov 17 12:44:51 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 17 Nov 2022 12:44:51 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <qBkjF_9TA_5OKNULPrzl1L7V1EGBj2zIBhEUAIxcocY=.6a835ebd-0155-4d38-9165-b360f2ac6d6d@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method This removal looks accurate: AliasLevel == 0 -> do_aliasing == false; else do_aliasing == true Thanks for fixing. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11207 From tholenstein at openjdk.org Thu Nov 17 12:53:47 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 17 Nov 2022 12:53:47 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <f8FN1PDtV9rMeakJ3tBgOzOJWyxWAT2MRMKnjROnBJY=.8c582a7e-aebd-4e93-a6da-687310310bed@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method Marked as reviewed by tholenstein (Committer). Looks good to me ------------- PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 12:53:48 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 12:53:48 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <LECfryXlYC9_0YWCiiCVrGrp-eMnjvBAOTtZTtPwQu4=.c1bce081-47f0-4b0b-81c3-88d093a19ada@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method Christian, David, Toby, thanks for the reviews. I'll push this right away after some sanity testing to address the `gtest/GTestWrapper.java` failures. ------------- PR: https://git.openjdk.org/jdk/pull/11207 From rcastanedalo at openjdk.org Thu Nov 17 13:14:22 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Nov 2022 13:14:22 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <T1vix_BVW-EJ-DqisNK6Po3n_u494MzaPTrfNy4V97s=.bb53493a-fc79-4c65-9c57-602bc489963f@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 13:23:24 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 13:23:24 GMT Subject: RFR: 8297201: Obsolete AliasLevel flag [v2] In-Reply-To: <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> <8FQfWAjXpLF_WgeipM5rjWgqXMSDRUXcblCPZDGrzB0=.130c4d93-92e0-4452-ba58-8c70fa321e74@github.com> Message-ID: <_2TGn06VCJffWgLFtMwG6dBfArTdBe4rHAFamxQnrf8=.cc65e549-6ed5-460e-aef5-b1c7f709d428@github.com> On Thu, 17 Nov 2022 12:37:13 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed checks from flatten_alias_type method Thanks, Roberto! ------------- PR: https://git.openjdk.org/jdk/pull/11207 From thartmann at openjdk.org Thu Nov 17 13:27:28 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Nov 2022 13:27:28 GMT Subject: Integrated: 8297201: Obsolete AliasLevel flag In-Reply-To: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> References: <1wHWfHkkb8duzctHALbKiMs4Wh26fApFlWHvhHvYtzI=.4298caad-56fd-4f38-9c3f-10e21b3287bf@github.com> Message-ID: <9svicpkbGi0Zp-RshkeDFQv3n3eTI1oToAdEzwjFkMQ=.aba1f138-a999-4108-8bf0-87a2e5130b1f@github.com> On Thu, 17 Nov 2022 11:57:02 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > The AliasLevel flag was deprecated in JDK 19 by [JDK-8075816](https://bugs.openjdk.org/browse/JDK-8075816) and should now be obsoleted. Patch is slightly adjusted https://github.com/openjdk/jdk/pull/8140/commits/6a1bed2d7ed5494b051fae09d28787fd01de9635, originally from @tobiasholenstein. > > Thanks, > Tobias This pull request has now been integrated. Changeset: b6aff542 Author: Tobias Hartmann <thartmann at openjdk.org> URL: https://git.openjdk.org/jdk/commit/b6aff54245df09a004f0457d0824e763dfad333e Stats: 80 lines in 10 files changed: 3 ins; 60 del; 17 mod 8297201: Obsolete AliasLevel flag Co-authored-by: Tobias Holenstein <tholenstein at openjdk.org> Reviewed-by: chagedorn, dholmes, tholenstein, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/11207 From stuefe at openjdk.org Thu Nov 17 13:30:57 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 13:30:57 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability [v2] In-Reply-To: <qZ1KS_MbDBbazQdi8qQNeaFqgzCFGcNEGH5wlRvYFZk=.e3900cf8-1acb-41bb-9126-ede3954cdb10@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <qZ1KS_MbDBbazQdi8qQNeaFqgzCFGcNEGH5wlRvYFZk=.e3900cf8-1acb-41bb-9126-ede3954cdb10@github.com> Message-ID: <frxRKBmZrdbJy0zSuP6vfz9oqk8QtnZHj8xvk529tr8=.1869268f-ab29-4d3b-b05d-96de895fb2b7@github.com> On Thu, 17 Nov 2022 10:29:34 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Refactor the STEP macro in VMError::report to improve readability. >> Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. >> >> This enhancement aims to do two things: >> 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. >> 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro >> >> Testing: tier 1 + GHA > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Follow HotSpot code style: no implicit boolean > - Respect 100 character line > - Revert extended test Ok! src/hotspot/share/utilities/vmError.cpp line 545: > 543: _step_did_timeout = false; \ > 544: if ((cond)) { > 545: // [Step logic] nit: newline please? src/hotspot/share/utilities/vmError.cpp line 546: > 544: if ((cond)) { > 545: // [Step logic] > 546: # define STEP(s) STEP_IF(s, true) nit: newline please? ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11018 From xlinzheng at openjdk.org Thu Nov 17 13:48:28 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 17 Nov 2022 13:48:28 GMT Subject: Integrated: 8296975: RISC-V: Enable UseRVA20U64 profile by default In-Reply-To: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> References: <dtew0HkGN3RyeX2CbEqwPOMNSA2AG4uc44UWoTdP8tg=.d756d671-1a5f-4b1f-ba8f-de969b5f62f6@github.com> Message-ID: <fyhzIAY8afTOP9upNfFE1R10cUAX-p3sAS2H-ProAN0=.e0e32b1c-cba4-4cac-901c-087b69aa338a@github.com> On Tue, 15 Nov 2022 04:05:35 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > The main purpose is to turn the option `UseRVC` on by default before JDK20 RDP 1. As per discussions [1], we can enable `UseRVA20U64`[2] by default to fulfill this. > > >> build/linux-riscv64-server-fastdebug/images/jdk/bin/java -XX:+PrintFlagsFinal -version | grep -E "UseRVC|UseRVA20U64" > bool UseRVA20U64 = true {ARCH product} {default} > bool UseRVC = true {ARCH product} {default} > openjdk version "20-internal" 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc..jdk) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc..jdk, mixed mode) > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000668.html > [2] https://github.com/openjdk/jdk/blob/873eccde01895de06e2216f6838d52d07188addd/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L39-L44 > > Thanks, > Xiaolin This pull request has now been integrated. Changeset: 38eb80d4 Author: Xiaolin Zheng <xlinzheng at openjdk.org> Committer: Vladimir Kempik <vkempik at openjdk.org> URL: https://git.openjdk.org/jdk/commit/38eb80d4d89cf45cd0c8422525121dcb62a1e999 Stats: 8 lines in 2 files changed: 6 ins; 1 del; 1 mod 8296975: RISC-V: Enable UseRVA20U64 profile by default Reviewed-by: fyang, vkempik ------------- PR: https://git.openjdk.org/jdk/pull/11155 From aboldtch at openjdk.org Thu Nov 17 14:03:21 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 17 Nov 2022 14:03:21 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v3] In-Reply-To: <gVFqJ_xOx-LskzaCU9mylHrccLnoCOAWu-zeK_IEzUU=.97674576-c596-48b2-8249-ec62f2edccf2@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> <gVFqJ_xOx-LskzaCU9mylHrccLnoCOAWu-zeK_IEzUU=.97674576-c596-48b2-8249-ec62f2edccf2@github.com> Message-ID: <0VDlIvJkWWMXlhAjqW2qXZj0oWlXSnpl6ggij-oqTUg=.03b3d6a0-4268-41b9-ba84-f9d046087ad5@github.com> On Wed, 16 Nov 2022 17:03:21 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. >> >> We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: >> https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fixes after merge > - Merge remote-tracking branch 'upstream/master' into 8296785_bitmap_realloc > - 8296785: Use realloc for CHeap-allocated BitMaps > - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction > - 8296774: Removed default MEMFLAGS value from CHeapBitMap Lgtm. A comment on MallocArrayAllocator. These are in general very dangerous to use (and our macors that use AllocateHeap directly) with regards to C++ types and lifetimes. Bitmap only uses this for primitive types which is generally safe. In general construction and destruction of the objects must be handled explicitly and carefully. (Outside of MallocArrayAllocator, which only allocates some underlying memory, it does not create any C++ objects) Reallocation is even harder, as it is (probably) impossible to statically determine if a type T is fine to reallocate via relocation. So users of MallocArrayAllocator<E>::reallocate must consider carefully if this is ok for the type E. (I mean ok in the sense that it will generate the correct behaviour on our supported compilers, not in the sense that it is ok w.r.t. C++ object lifetimes) ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/11102 From stuefe at openjdk.org Thu Nov 17 14:04:27 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 14:04:27 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <iEUHXJRgtb4XUbnFjt78pilRbh_UDIxMAwTcRBflVcg=.8d7f582f-f428-4793-9eff-b2152fa3d55c@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> <iEUHXJRgtb4XUbnFjt78pilRbh_UDIxMAwTcRBflVcg=.8d7f582f-f428-4793-9eff-b2152fa3d55c@github.com> Message-ID: <_NgwqPcf4TlzSuFa2BscNgSOXnqAyPtD84WWTX-GsOQ=.2afe6ad7-7e4a-414f-96b2-3f39b7bc4dbd@github.com> On Thu, 17 Nov 2022 02:17:09 GMT, David Holmes <dholmes at openjdk.org> wrote: > I don't disagree that C-heap trimming is useful even if only Linux does it. My objection is to defining a platform-agnostic API when only Linux does it. I see. It will allow us to use these APIs in shared code though, without having to use platform ifdefs. We do similar things for other platforms (e.g. see os::map_stack_shadow_pages). ------------- PR: https://git.openjdk.org/jdk/pull/11089 From aboldtch at openjdk.org Thu Nov 17 14:12:24 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 17 Nov 2022 14:12:24 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <ssm98JXq9y_-z7Em7AbEfff4nhOSUCV48c5_JBEuuVs=.162e5afe-b0c3-42a1-aabe-4e162a0c3a9f@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> <ssm98JXq9y_-z7Em7AbEfff4nhOSUCV48c5_JBEuuVs=.162e5afe-b0c3-42a1-aabe-4e162a0c3a9f@github.com> Message-ID: <VoLCy850DNHFwRG9usfOSMumBLBR0uI8RETy-hcdEbE=.0d9d112b-04c2-4a52-95b4-701ff85efb2d@github.com> On Thu, 17 Nov 2022 07:42:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > since it would disable call stack printing for the next secondary error. Would it? The example above would only disable reentering the if scope if we have a crash in `[...]`. If it is fine the flag is reset, and if we crash in `[...]` it will not enter the if scope and reset the flag. When it crash in a new secondary error it will once again enter the if scope and try to print the call stack. But I agree that there is always this taking it one step deeper of where you make things recoverable. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Thu Nov 17 14:23:24 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 14:23:24 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <VoLCy850DNHFwRG9usfOSMumBLBR0uI8RETy-hcdEbE=.0d9d112b-04c2-4a52-95b4-701ff85efb2d@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> <-YBmWkv1MgkIsEoMtU2nWWZouVioSN-k6Tz3-D6K6yw=.0c867d12-4532-4369-8a94-e23c1d667408@github.com> <ssm98JXq9y_-z7Em7AbEfff4nhOSUCV48c5_JBEuuVs=.162e5afe-b0c3-42a1-aabe-4e162a0c3a9f@github.com> <VoLCy850DNHFwRG9usfOSMumBLBR0uI8RETy-hcdEbE=.0d9d112b-04c2-4a52-95b4-701ff85efb2d@github.com> Message-ID: <w48ka93_YPdEYh7Pv45oKtbOunwV3NB1QJH7Qos7QvA=.f4904e7b-67d5-4aec-bf0e-ce355aa5d7eb@github.com> On Thu, 17 Nov 2022 14:09:47 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> It would work horizontally too, though, since it would disable call stack printing for the next secondary error. >> >> I still think yours is a good idea. We can extend it to using a counter and only allow a small number (2-5) of secondary errors to get printed out with their call stacks. Because if we have more than, say, 5 recursive errors, chances are they all trip over the same thing anyway and more call stacks won't tell you anything new. >> >> And then, if we limit the number of these things, maybe we can afford to leave this feature on by default in debug builds. That addresses the concern @dholmes-ora voiced, about having to switch on the feature first. > >> since it would disable call stack printing for the next secondary error. > > Would it? The example above would only disable reentering the if scope if we have a crash in `[...]`. If it is fine the flag is reset, and if we crash in `[...]` it will not enter the if scope and reset the flag. When it crash in a new secondary error it will once again enter the if scope and try to print the call stack. > > But I agree that there is always this taking it one step deeper of where you make things recoverable. Right, you did reset it, I see now. Sorry, did not look close enough :) ------------- PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Thu Nov 17 14:44:53 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 14:44:53 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v4] In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <ljKBJ_4Mx5YwvkzkCyMSWRdX8yy7IrJ1wmNHuuTlTJs=.8b96d272-4db2-46de-9e10-899999515c0b@github.com> > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback Axel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11118/files - new: https://git.openjdk.org/jdk/pull/11118/files/7b3a506a..06a9bd58 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=02-03 Stats: 20 lines in 1 file changed: 9 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Thu Nov 17 14:44:56 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 17 Nov 2022 14:44:56 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3] In-Reply-To: <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <qiIW8Ml8o8z8Q3sT38eXkRX01MvfQIAfmbD03iJ2Cdk=.a2c78a10-f32f-43e4-ae0b-a4185b57ec45@github.com> Message-ID: <63TuLtxqayu8UbFyro3sw35_0kQr2gyDqfb9qM-00wE=.191a8b31-0832-4b08-8aac-c9b02a4745b0@github.com> On Mon, 14 Nov 2022 07:29:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors > - Feedback David Followed Axels idea of limiting recursion errors when printing secondary error crashes. I briefly played with more changes, and optionally removing the new switch, but in the end kept the change simple. I also disabled source code printing from the secondary crash call stacks since that may be quite fragile. Re-ran test on linux x64 and x86. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From lmesnik at openjdk.org Thu Nov 17 15:45:23 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 17 Nov 2022 15:45:23 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM In-Reply-To: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> Message-ID: <b-ilYV971NN2aY5tLCzEfCIk8SmXMGA-x6M_FM2_VgA=.c38b72d5-1f9e-4813-a1e9-e5e1c213d2d5@github.com> On Thu, 17 Nov 2022 09:12:07 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. > This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. > > Testing: > New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` > This test is failed without fix and passed with it. > TBD: run all JVMTI and JDI test in mach5. test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest/VirtualStackTraceTest.java line 27: > 25: * @test > 26: * @summary Verifies JVMTI GetStackTrace does not truncate virtual thread stack trace with agent attach > 27: * @requires vm.continuations @requires vm.jvmti is also needed ------------- PR: https://git.openjdk.org/jdk/pull/11204 From duke at openjdk.org Thu Nov 17 16:20:29 2022 From: duke at openjdk.org (ExE Boss) Date: Thu, 17 Nov 2022 16:20:29 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v27] In-Reply-To: <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <-Lw-dDGfVAZlOT815DeyvfwP0NTWWbj4X0lrl9ek_iQ=.70a5ad19-062f-488d-97fb-f8d923c2dc17@github.com> Message-ID: <HAnwHSgGBY35ThuRENIFnsNwci18giPhEYTw-di36vk=.571ad158-2381-4e15-8cd1-f87c0193f4d1@github.com> On Tue, 15 Nov 2022 18:47:39 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in SegmentScope javadoc src/java.base/share/classes/jdk/internal/foreign/MemorySessionImpl.java line 77: > 75: } catch (Throwable ex) { > 76: throw new ExceptionInInitializerError(ex); > 77: } The?above `catch`?clause should?only catch?`Exception`s, not?`Throwable`s, as?the?latter would?hide VM?errors such?as?`StackOverflowError` or?`OutOfMemoryError`. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From psandoz at openjdk.org Thu Nov 17 16:55:26 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 17 Nov 2022 16:55:26 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> Message-ID: <EJkCeTy8dOSzkBwHfx4snp6BBcIER2htjrWTxwFIeOY=.9fc75771-6651-4602-b187-6006bceed663@github.com> On Wed, 16 Nov 2022 16:55:24 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Javadoc changes. > - ProblemList.txt cleanup src/hotspot/share/utilities/exceptions.cpp line 166: > 164: // Remove the ScopedValue cache in case we got a virtual machine > 165: // Error while we were trying to manipulate ScopedValue bindings. > 166: thread->set_scopedValueCache(NULL); I am see this pattern repeat quite often: thread->set_scopedValueCache(NULL); oop threadObj = thread->vthread(); assert(threadOop != NULL, "must be"); // <--- sometimes java_lang_Thread::clear_scopedValueBindings(threadObj); Encapsulate in a method on the `JavaThread` class? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From kvn at openjdk.org Thu Nov 17 16:58:20 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Nov 2022 16:58:20 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> Message-ID: <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> On Thu, 3 Nov 2022 11:05:37 GMT, Andrew Haley <aph at openjdk.org> wrote: > > Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. > > It might be a bit more than that: it's a test-and-branch on every memory access. Maybe enable it only on a non-optimized build? I am fine with enabling it for debug VM. But can you give at least some numbers? ------------- PR: https://git.openjdk.org/jdk/pull/10920 From sspitsyn at openjdk.org Thu Nov 17 17:35:32 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 17 Nov 2022 17:35:32 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> Message-ID: <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> > The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. > This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. > > Testing: > New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` > This test is failed without fix and passed with it. > TBD: run all JVMTI and JDI test in mach5. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: add @requires vm.jvmti to VirtualStackTraceTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11204/files - new: https://git.openjdk.org/jdk/pull/11204/files/b995eb44..5a15a6da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11204&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11204&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11204.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11204/head:pull/11204 PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Thu Nov 17 17:35:39 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 17 Nov 2022 17:35:39 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <b-ilYV971NN2aY5tLCzEfCIk8SmXMGA-x6M_FM2_VgA=.c38b72d5-1f9e-4813-a1e9-e5e1c213d2d5@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <b-ilYV971NN2aY5tLCzEfCIk8SmXMGA-x6M_FM2_VgA=.c38b72d5-1f9e-4813-a1e9-e5e1c213d2d5@github.com> Message-ID: <SPPsCVkUK62eNZGEWqOOWVpv0S_i72bZX1OvBXpZ12I=.5d317adc-e6d4-47a9-8d46-9c8d2cda4b70@github.com> On Thu, 17 Nov 2022 15:36:01 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> add @requires vm.jvmti to VirtualStackTraceTest > > test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest/VirtualStackTraceTest.java line 27: > >> 25: * @test >> 26: * @summary Verifies JVMTI GetStackTrace does not truncate virtual thread stack trace with agent attach >> 27: * @requires vm.continuations > > @requires vm.jvmti is also needed Thank you for the suggestion. Added now. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From sviswanathan at openjdk.org Thu Nov 17 18:14:25 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 17 Nov 2022 18:14:25 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> Message-ID: <iXI36w_0YjmY0UPpspZRDe6ZMXEiGM_BDaO4NS8QOGM=.fe990924-094f-433d-9f4b-a25aa5a0f42f@github.com> On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations May be @nick-arm could review and approve for aarch64. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From cjplummer at openjdk.org Thu Nov 17 18:46:24 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 17 Nov 2022 18:46:24 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> Message-ID: <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> On Thu, 17 Nov 2022 17:35:32 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. >> This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. >> >> Testing: >> New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` >> This test is failed without fix and passed with it. >> TBD: run all JVMTI and JDI test in mach5. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > add @requires vm.jvmti to VirtualStackTraceTest Changes requested by cjplummer (Reviewer). src/hotspot/share/prims/jvmtiExport.cpp line 385: > 383: if (Continuations::enabled()) { > 384: // Virtual threads support. There is a performance impact when VTMS transitions are enabled. > 385: if (!java_lang_VirtualThread ::notify_jvmti_events()) { remove extra space before :: src/hotspot/share/prims/jvmtiExport.cpp line 390: > 388: ThreadInVMfromNative tiv(JavaThread::current()); > 389: java_lang_VirtualThread::init_static_notify_jvmti_events(); > 390: } Doesn't this logic mean that if the first pass through this code is made with an unattached thread, then that will prevent subsequent passes from calling `init_static_notify_jvmti_events`, even if the thread is attached. The reason is because `java_lang_VirtualThread::set_notify_jvmti_events(true);` will already have been done, so you won't pass the `if (!java_lang_VirtualThread ::notify_jvmti_events())` check. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From jnimeh at openjdk.org Thu Nov 17 18:53:24 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 17 Nov 2022 18:53:24 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> Message-ID: <emKiLeQ71GF0MnhnB12gWSYRIg7ZSe8Efl0tnxPv300=.2c03f75a-2e3f-4708-a864-557119222cfa@github.com> On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations Another pair of arm-knowledgeable eyes on this is always welcome! ------------- PR: https://git.openjdk.org/jdk/pull/7702 From aph at openjdk.org Thu Nov 17 19:00:29 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 17 Nov 2022 19:00:29 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <EJkCeTy8dOSzkBwHfx4snp6BBcIER2htjrWTxwFIeOY=.9fc75771-6651-4602-b187-6006bceed663@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> <EJkCeTy8dOSzkBwHfx4snp6BBcIER2htjrWTxwFIeOY=.9fc75771-6651-4602-b187-6006bceed663@github.com> Message-ID: <3wlfTnsbrVrppFImhVNfweUmw7nm9L2az9SOlqiNCAk=.c3478ce5-b4ab-46ba-a0e9-c7667d9cf5b2@github.com> On Thu, 17 Nov 2022 16:53:13 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Javadoc changes. >> - ProblemList.txt cleanup > > src/hotspot/share/utilities/exceptions.cpp line 166: > >> 164: // Remove the ScopedValue cache in case we got a virtual machine >> 165: // Error while we were trying to manipulate ScopedValue bindings. >> 166: thread->set_scopedValueCache(NULL); > > I am see this pattern repeat quite often: > > thread->set_scopedValueCache(NULL); > oop threadObj = thread->vthread(); > assert(threadOop != NULL, "must be"); // <--- sometimes > java_lang_Thread::clear_scopedValueBindings(threadObj); > > Encapsulate in a method on the `JavaThread` class? That sounds good. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From vlivanov at openjdk.org Thu Nov 17 19:36:33 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Nov 2022 19:36:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> Message-ID: <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> On Thu, 17 Nov 2022 03:23:49 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > vzeroall, no spill, reg re-map Overall, looks good. Just one minor cleanup suggestion. I've submitted the latest patch for testing (hs-tier1 - hs-tier4). src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 377: > 375: __ shlq(t0, 40); > 376: __ addq(a1, t0); > 377: if (a2 == noreg) { Please, get rid of early return and turn the check into `if (a2 != noreg) { ... }` which guards the following code. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From psandoz at openjdk.org Thu Nov 17 20:33:36 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 17 Nov 2022 20:33:36 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> Message-ID: <PbLj4HZspA7Ti-YypZe3kivzksHh4Jjewfk0r0xclUU=.0f2dc4f8-adca-47fa-9c2c-eb80029890d2@github.com> On Wed, 16 Nov 2022 16:55:24 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Javadoc changes. > - ProblemList.txt cleanup src/java.base/share/classes/java/lang/Thread.java line 744: > 742: > 743: // special value to mean a new thread > 744: this.scopedValueBindings = Thread.class; Perhaps: static final Object NEW_THREAD_BINDINGS = Thread.class; ... this.scopedValueBindings = NEW_THREAD_BINDINGS; ? src/java.base/share/classes/java/lang/Thread.java line 1614: > 1612: } > 1613: > 1614: @Hidden Should we document that the name and signature are special (same of other relevant methods not already documented). src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 49: > 47: * > 48: * <p> {@code ScopedValue} defines the {@link #where(ScopedValue, Object, Runnable)} > 49: * method to set the value of a {@code ScopedValue} for the bouned period of execution by Suggestion: * method to set the value of a {@code ScopedValue} for the bounded period of execution by src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 241: > 239: } > 240: > 241: static final class EmptySnapshot extends Snapshot { We could make `Snapshot` final have a static final field? static final Snapshot EMPTY_SNAPSHOT = new Snapshot(); src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 391: > 389: * JVM_FindScopedValueBindings(). > 390: */ > 391: private <R> R runWith(Snapshot newSnapshot, Callable<R> op) throws Exception { Missing `@Hidden` and `@ForceInline` ? like with the other `runWith` method accepting `Runnable`? (I was gonna suggest changing the name to `callWith`, but then reread the docs and VM code. Its convenient to have the same names.) src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 672: > 670: return EmptySnapshot.getInstance(); > 671: } > 672: if (bindings == null) { Suggestion: if (bindings == NEW_THREAD_BINDINGS) { // This must be a new thread return Snapshot.EMPTY_SNAPSHOT; } else if (bindings == null) { src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 720: > 718: // is invoked, we record the result of the lookup in this per-thread cache > 719: // for fast access in future. > 720: private static class Cache { Make the class final and remove the qualifier on all the methods. src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 824: > 822: } > 823: > 824: public static void invalidate() { Is this method used? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Thu Nov 17 20:42:27 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 20:42:27 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22] In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: remove early return ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/56aed9b1..08ea45e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=20-21 Stats: 29 lines in 1 file changed: 13 ins; 14 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Thu Nov 17 20:42:27 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 17 Nov 2022 20:42:27 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> Message-ID: <irGzi76yjKKgch-zgHgreUxprgdldCOQV1_p9RXCuoQ=.2f4a9cae-e119-448d-8de4-e326e3e79b47@github.com> On Thu, 17 Nov 2022 19:30:14 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 377: > >> 375: __ shlq(t0, 40); >> 376: __ addq(a1, t0); >> 377: if (a2 == noreg) { > > Please, get rid of early return and turn the check into `if (a2 != noreg) { ... }` which guards the following code. done (some golang-ism slipped in.. rewiring habits again) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From aturbanov at openjdk.org Thu Nov 17 20:45:00 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 17 Nov 2022 20:45:00 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <wp-vc-bZCiQP5Iw-mYM6khxj2GwgbXeI-7UloZQJzrU=.483cd087-1a11-4e78-8da7-eac45f249fc6@github.com> On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) test/hotspot/jtreg/runtime/ErrorHandling/TestReentrantErrorHandler.java line 160: > 158: System.err.println("<end hs_err contents>"); > 159: } > 160: throw new RuntimeException("hs-err file incomplete (first missing pattern: \"" + pattern[currentPattern].pattern() + "\")"); Nit Suggestion: throw new RuntimeException("hs-err file incomplete (first missing pattern: "" + pattern[currentPattern].pattern() + "")"); ------------- PR: https://git.openjdk.org/jdk/pull/11017 From stefank at openjdk.org Thu Nov 17 21:25:47 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Nov 2022 21:25:47 GMT Subject: RFR: 8296785: Use realloc for CHeap-allocated BitMaps [v3] In-Reply-To: <gVFqJ_xOx-LskzaCU9mylHrccLnoCOAWu-zeK_IEzUU=.97674576-c596-48b2-8249-ec62f2edccf2@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> <gVFqJ_xOx-LskzaCU9mylHrccLnoCOAWu-zeK_IEzUU=.97674576-c596-48b2-8249-ec62f2edccf2@github.com> Message-ID: <3lb5AH1ez6NrQDEsIjOWXDS9d2xC3-LgNkqITT9woEA=.b493237d-96ce-449b-89f1-e20362f77d82@github.com> On Wed, 16 Nov 2022 17:03:21 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. >> >> We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: >> https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fixes after merge > - Merge remote-tracking branch 'upstream/master' into 8296785_bitmap_realloc > - 8296785: Use realloc for CHeap-allocated BitMaps > - Merge remote-tracking branch 'upstream/master' into 8296774_bitmap_stricter_construction > - 8296774: Removed default MEMFLAGS value from CHeapBitMap Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11102 From stefank at openjdk.org Thu Nov 17 21:27:35 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 17 Nov 2022 21:27:35 GMT Subject: Integrated: 8296785: Use realloc for CHeap-allocated BitMaps In-Reply-To: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> References: <x_hI8lT5-LheB51VLIo4hf2OslBRSNJbADshpwvIUeQ=.a4c6b472-3aeb-47f1-9703-06d0da23b72b@github.com> Message-ID: <gjj5eMnGTgtT-5EULz5PlmjpqlSyS__TZoUtlroj4Tc=.15502d70-9909-4944-9d55-aad730ec147e@github.com> On Fri, 11 Nov 2022 06:39:32 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > Today CHeap allocated bitmaps don't resize with realloc. I'd like to change that by fixing that by adding support for realloc in the ArrayAllocator classes, and then use that when resizing the bitmaps. > > We've been using and testing one version of this patch in the Generational ZGC repository for a while now. That version is slightly different because of recent rewrites of the bitmaps, but in essence the same. See: > https://github.com/openjdk/zgc/commit/ca692f686bda8d86d3786c2afc782bfdc54fbdfc This pull request has now been integrated. Changeset: 373e52c0 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/373e52c0ab0d4fd3c6b18e67e0c46d1d1f0ac91e Stats: 226 lines in 5 files changed: 174 ins; 30 del; 22 mod 8296785: Use realloc for CHeap-allocated BitMaps Reviewed-by: stuefe, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/11102 From sspitsyn at openjdk.org Fri Nov 18 01:44:24 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 01:44:24 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> Message-ID: <Q2D34sCDDHjKpZ9noTdfhEYVWpQJYGCtCweRHY0nMOI=.be997afa-6ed8-42c5-ab71-a4342b4af2e1@github.com> On Thu, 17 Nov 2022 18:37:09 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> add @requires vm.jvmti to VirtualStackTraceTest > > src/hotspot/share/prims/jvmtiExport.cpp line 385: > >> 383: if (Continuations::enabled()) { >> 384: // Virtual threads support. There is a performance impact when VTMS transitions are enabled. >> 385: if (!java_lang_VirtualThread ::notify_jvmti_events()) { > > remove extra space before :: Thank you. Will fix it. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Fri Nov 18 02:16:57 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 02:16:57 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> Message-ID: <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> On Thu, 17 Nov 2022 18:41:08 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> add @requires vm.jvmti to VirtualStackTraceTest > > src/hotspot/share/prims/jvmtiExport.cpp line 390: > >> 388: ThreadInVMfromNative tiv(JavaThread::current()); >> 389: java_lang_VirtualThread::init_static_notify_jvmti_events(); >> 390: } > > Doesn't this logic mean that if the first pass through this code is made with an unattached thread, then that will prevent subsequent passes from calling `init_static_notify_jvmti_events`, even if the thread is attached. The reason is because `java_lang_VirtualThread::set_notify_jvmti_events(true);` will already have been done, so you won't pass the `if (!java_lang_VirtualThread ::notify_jvmti_events())` check. Enabling the `notify_jvmti_events` is an optimization to avoid having this notification overhead with JVMTI virtual thread mount state transitions when it is not needed. We need to enable it only once and never disable it if enabled. The first attempt to enable it is at startup if there was any agent loaded with command line options. In such a case, the get_jvmti_interface() is called in a context of `AgentOnLoad()` in unattached thread. It only sets the `java_lang_VirtualThread::set_notify_jvmti_events(true)` The `init_static_notify_jvmti_events()` is called from the `javaClasses_init()`. The agents that are loaded into running VM are initialized with the `AgentOnAttach()`. In this case, we can't rely on the `javaClasses_init()` and so, have to explicitly call the `init_static_notify_jvmti_events()`. I feels like this can be simplified. I keep thinking about the best way to do it. Probably, the pair <set_notify_jvmti_events, init_static_notify_jvmti_events> can be replaced with just function. The problem is that we can't use the `ThreadInVMfromNative` helper for unattached thread. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From dholmes at openjdk.org Fri Nov 18 02:26:18 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 18 Nov 2022 02:26:18 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> Message-ID: <1AnmnfpkVl5wlrToUvDxOzrivkGZP-1qFwaM0b3Xu-Y=.4ef3ed1a-02be-4b9b-93f1-84e7dd7d62c8@github.com> On Mon, 14 Nov 2022 07:32:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. >> >> We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David Okay. I have some reservations about this style of approach but the precedents are there. I'd argue that for single-use situations like this and os::map_stack_shadow_pages that a XXX_ONLY(foo();) in the shared code would be acceptable. Others may disagree. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11089 From cjplummer at openjdk.org Fri Nov 18 03:47:03 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 18 Nov 2022 03:47:03 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> Message-ID: <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> On Fri, 18 Nov 2022 02:11:39 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 390: >> >>> 388: ThreadInVMfromNative tiv(JavaThread::current()); >>> 389: java_lang_VirtualThread::init_static_notify_jvmti_events(); >>> 390: } >> >> Doesn't this logic mean that if the first pass through this code is made with an unattached thread, then that will prevent subsequent passes from calling `init_static_notify_jvmti_events`, even if the thread is attached. The reason is because `java_lang_VirtualThread::set_notify_jvmti_events(true);` will already have been done, so you won't pass the `if (!java_lang_VirtualThread ::notify_jvmti_events())` check. > > Enabling the `notify_jvmti_events` is an optimization to avoid having this notification overhead with JVMTI virtual thread mount state transitions when it is not needed. > We need to enable it only once and never disable it if enabled. > The first attempt to enable it is at startup if there was any agent loaded with command line options. > In such a case, the get_jvmti_interface() is called in a context of `AgentOnLoad()` in unattached thread. > It only sets the `java_lang_VirtualThread::set_notify_jvmti_events(true)` The `init_static_notify_jvmti_events()` is called from the `javaClasses_init()`. > The agents that are loaded into running VM are initialized with the `AgentOnAttach()`. > In this case, we can't rely on the `javaClasses_init()` and so, have to explicitly call the `init_static_notify_jvmti_events()`. > I feels like this can be simplified. I keep thinking about the best way to do it. > Probably, the pair <set_notify_jvmti_events, init_static_notify_jvmti_events> can be replaced with just function. The problem is that we can't use the `ThreadInVMfromNative` helper for unattached thread. If `notify_jvmti_events()` is false, then you call `set_notify_jvmti_events(true)`, which means you will never enter the `if` block again. However, if the thread is not attached, you do not call `init_static_notify_jvmti_events()`. What happens if later there is an attached thread that triggers this code? Is seem when that happens you should call `init_static_notify_jvmti_events()`, but won't because `notify_jvmti_events()` is true. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From cjplummer at openjdk.org Fri Nov 18 03:47:04 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 18 Nov 2022 03:47:04 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> Message-ID: <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> On Fri, 18 Nov 2022 03:42:39 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> Enabling the `notify_jvmti_events` is an optimization to avoid having this notification overhead with JVMTI virtual thread mount state transitions when it is not needed. >> We need to enable it only once and never disable it if enabled. >> The first attempt to enable it is at startup if there was any agent loaded with command line options. >> In such a case, the get_jvmti_interface() is called in a context of `AgentOnLoad()` in unattached thread. >> It only sets the `java_lang_VirtualThread::set_notify_jvmti_events(true)` The `init_static_notify_jvmti_events()` is called from the `javaClasses_init()`. >> The agents that are loaded into running VM are initialized with the `AgentOnAttach()`. >> In this case, we can't rely on the `javaClasses_init()` and so, have to explicitly call the `init_static_notify_jvmti_events()`. >> I feels like this can be simplified. I keep thinking about the best way to do it. >> Probably, the pair <set_notify_jvmti_events, init_static_notify_jvmti_events> can be replaced with just function. The problem is that we can't use the `ThreadInVMfromNative` helper for unattached thread. > > If `notify_jvmti_events()` is false, then you call `set_notify_jvmti_events(true)`, which means you will never enter the `if` block again. However, if the thread is not attached, you do not call `init_static_notify_jvmti_events()`. What happens if later there is an attached thread that triggers this code? Is seem when that happens you should call `init_static_notify_jvmti_events()`, but won't because `notify_jvmti_events()` is true. I think you need a flag that tells you if `init_static_notify_jvmti_events()` has been called. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Fri Nov 18 05:04:53 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 05:04:53 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> Message-ID: <cgGbLjgnx6pGTAiLEfzGEwf8XkXVZK8T76-NCobHmN8=.60b4a1ca-b5d4-4874-9ff1-eca1ed724ae3@github.com> On Fri, 18 Nov 2022 03:43:34 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> If `notify_jvmti_events()` is false, then you call `set_notify_jvmti_events(true)`, which means you will never enter the `if` block again. However, if the thread is not attached, you do not call `init_static_notify_jvmti_events()`. What happens if later there is an attached thread that triggers this code? Is seem when that happens you should call `init_static_notify_jvmti_events()`, but won't because `notify_jvmti_events()` is true. > > I think you need a flag that tells you if `init_static_notify_jvmti_events()` has been called. A part of the initialization sequence we need to know is: create_vm() { . . . // Launch -agentlib/-agentpath and converted -Xrun agents if (Arguments::init_agents_at_startup()) { create_vm_init_agents(); => { <loads all agents and calls AgentOnLoad entry points> => get_jvmti_interface() => set_notify_jvmti_events(true) } . . . init_globals() => javaClasses_init() => java_lang_VirtualThread::init_static_notify_jvmti_events() The `create_vm_init_agents()` is called in the context of unattaching thread. In this context a call to `java_lang_VirtualThread::init_static_notify_jvmti_events()` is guaranteed to happen after all the agents were successfully loaded at startup and executed their `AgentOnLoad` entree points which make calls to `vm->GetEnv()` that transitively call to `get_jvmti_interface()` and `java_lang_VirtualThread::set_notify_jvmti_events(true)`. We can add a comment on this but I'm puzzled on how to make it clear and simple. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From cjplummer at openjdk.org Fri Nov 18 05:23:20 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 18 Nov 2022 05:23:20 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <cgGbLjgnx6pGTAiLEfzGEwf8XkXVZK8T76-NCobHmN8=.60b4a1ca-b5d4-4874-9ff1-eca1ed724ae3@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> <cgGbLjgnx6pGTAiLEfzGEwf8XkXVZK8T76-NCobHmN8=.60b4a1ca-b5d4-4874-9ff1-eca1ed724ae3@github.com> Message-ID: <B_5cv6n6HpGp1qsXdgUrppL6y8kOV29MgDBxAYKDl_g=.4f8aaea0-56ff-4e3a-a36d-2251f0d7f7a4@github.com> On Fri, 18 Nov 2022 04:55:10 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> I think you need a flag that tells you if `init_static_notify_jvmti_events()` has been called. > > A part of the initialization sequence we need to know is: > > create_vm() { > . . . > // Launch -agentlib/-agentpath and converted -Xrun agents > if (Arguments::init_agents_at_startup()) { > create_vm_init_agents(); => { > <loads all agents and calls AgentOnLoad entry points> => > get_jvmti_interface() => set_notify_jvmti_events(true) > } > . . . > init_globals() => javaClasses_init() => java_lang_VirtualThread::init_static_notify_jvmti_events() > > The `create_vm_init_agents()` is called in the context of unattaching thread. > In this context a call to `java_lang_VirtualThread::init_static_notify_jvmti_events()` is guaranteed to happen after all the agents were successfully loaded at startup and executed their `AgentOnLoad` entree points which make calls to `vm->GetEnv()` that transitively call to `get_jvmti_interface()` and `java_lang_VirtualThread::set_notify_jvmti_events(true)`. > > The conclusion is that the `java_lang_VirtualThread::init_static_notify_jvmti_events()` is always called at startup (single-threaded execution mode) after load of all the agents. > In opposite, all calls to `get_jvmti_interface()` from the `AgentOnAttach` entry points have to in context of attached threads. I'm thinking if we could add an assert to ensure it is always the case. > We can add a comment on this but I'm puzzled on how to make it clear and simple. If there are no command line agents, then on startup `vthread_notify_jvmti_events` is not set true. Because it is not true, when `javaClasses_init()` calls `init_static_notify_jvmti_events()`, it does nothing. The whole point of the code we are reviewing here is to make sure `init_static_notify_jvmti_events()` is called while `vthread_notify_jvmti_events == true` so it actually does something. However, the code here does not bother calling `init_static_notify_jvmti_events()` if the current thread is detached, but it does still set `vthread_notify_jvmti_events = true`. This means that if this code gets called a second time, this time with the current thread attached, it will not call `init_static_notify_jvmti_events()` due to `vthread_notify_jvmti_events == true`, but it seems it should be calling it. What I believe to be the flaw here is that you call `set_notify_jvmti_events(true)` even if you don't call `init_static_notify_jvmti_events()`. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From xuelei at openjdk.org Fri Nov 18 05:48:13 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 18 Nov 2022 05:48:13 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v8] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <oC6LQeIf4CEqbBCGPpuFaCDK4MsbBX9s2PhAE9KU464=.e3748d7f-ed20-4072-b7d6-d4de67d22d43@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: assert os::snprintf return value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/f2158c8b..dcd7a8df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=06-07 Stats: 271 lines in 23 files changed: 181 ins; 1 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Fri Nov 18 06:12:11 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 18 Nov 2022 06:12:11 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <1AnmnfpkVl5wlrToUvDxOzrivkGZP-1qFwaM0b3Xu-Y=.4ef3ed1a-02be-4b9b-93f1-84e7dd7d62c8@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> <1AnmnfpkVl5wlrToUvDxOzrivkGZP-1qFwaM0b3Xu-Y=.4ef3ed1a-02be-4b9b-93f1-84e7dd7d62c8@github.com> Message-ID: <PgQoMvTVLPfUlOdmLoRG4ZhKKX5x2jj77FqYgSwU8Oo=.c7e748e5-1a01-4ad9-8e98-52122862ea61@github.com> On Fri, 18 Nov 2022 02:23:54 GMT, David Holmes <dholmes at openjdk.org> wrote: > Okay. I have some reservations about this style of approach but the precedents are there. I'd argue that for single-use situations like this and os::map_stack_shadow_pages that a XXX_ONLY(foo();) in the shared code would be acceptable. Others may disagree. > Thanks David. I myself lean towards ifdefs too than having to search for the one platform implementation that actually does something. But I caught flak in the past for too many ifdefs. If it annoys us after [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114) and no second platform implementation is forthcoming, I'll remove the APIs from shared os. ------------- PR: https://git.openjdk.org/jdk/pull/11089 From sspitsyn at openjdk.org Fri Nov 18 06:19:00 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 06:19:00 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <B_5cv6n6HpGp1qsXdgUrppL6y8kOV29MgDBxAYKDl_g=.4f8aaea0-56ff-4e3a-a36d-2251f0d7f7a4@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> <cgGbLjgnx6pGTAiLEfzGEwf8XkXVZK8T76-NCobHmN8=.60b4a1ca-b5d4-4874-9ff1-eca1ed724ae3@github.com> <B_5cv6n6HpGp1qsXdgUrppL6y8kOV29MgDBxAYKDl_g=.4f8aaea0-56ff-4e3a-a36d-2251f0d7f7a4@github.com> Message-ID: <8YB9Syeua6CdAhTa_gRUS_ddWv7_nXDkip8VRzYGgFc=.93d169b8-0779-4b99-b88f-8d5de380b4fa@github.com> On Fri, 18 Nov 2022 05:21:02 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: > What I believe to be the flaw here is that you call `set_notify_jvmti_events(true)` > even if you don't call `init_static_notify_jvmti_events()`. This only happen in a detached thread case which can be only at startup. It was implemented this way before my fix. The javaClasses has to be partially initialized before any call to `init_static_notify_jvmti_events()`: void javaClasses_init() { JavaClasses::compute_offsets(); <== This is needed JavaClasses::check_offsets(); java_lang_VirtualThread::init_static_notify_jvmti_events(); FilteredFieldsMap::initialize(); // must be done after computing offsets. } For detached thread case (which is at starup) it is guaranteed that the `init_static_notify_jvmti_events()` is unconditionally called later at initialization stage. Please, see the chain of calls: `create_vm() => init_globals() => javaClasses_init() => java_lang_VirtualThread::init_static_notify_jvmti_events()`. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From xuelei at openjdk.org Fri Nov 18 06:42:24 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 18 Nov 2022 06:42:24 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v9] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <3vPeF8_u7AOdiGQRRw5xT66iRe9rhrb1OTDTt0Fef0k=.a3ad1514-2185-4e75-9a78-01e37e4daa8d@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: size_t cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/dcd7a8df..59a87dd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=07-08 Stats: 38 lines in 18 files changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Fri Nov 18 07:12:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 18 Nov 2022 07:12:28 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v9] In-Reply-To: <3vPeF8_u7AOdiGQRRw5xT66iRe9rhrb1OTDTt0Fef0k=.a3ad1514-2185-4e75-9a78-01e37e4daa8d@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <3vPeF8_u7AOdiGQRRw5xT66iRe9rhrb1OTDTt0Fef0k=.a3ad1514-2185-4e75-9a78-01e37e4daa8d@github.com> Message-ID: <Kh_w28uJ8eEATp3x0Is2SV8qTHYbqTYb2xN6vmDXSuw=.62bd9422-6b87-43fb-abd3-4c7ec3a29714@github.com> On Fri, 18 Nov 2022 06:42:24 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > size_t cast Hi @XueleiFan, the last version with the asserts looks fine to me. Thanks for your work! Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11115 From sspitsyn at openjdk.org Fri Nov 18 07:23:20 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 07:23:20 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v3] In-Reply-To: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> Message-ID: <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> > The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. > This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. > > Testing: > New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` > This test is failed without fix and passed with it. > TBD: run all JVMTI and JDI test in mach5. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: ajust condition when init_static_notify_jvmti_events() is called ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11204/files - new: https://git.openjdk.org/jdk/pull/11204/files/5a15a6da..a3123386 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11204&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11204&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11204.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11204/head:pull/11204 PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Fri Nov 18 07:23:20 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 07:23:20 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v2] In-Reply-To: <8YB9Syeua6CdAhTa_gRUS_ddWv7_nXDkip8VRzYGgFc=.93d169b8-0779-4b99-b88f-8d5de380b4fa@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6wNbVZY9ixmAKTwOZWRAaeGY_UBqT0Pz8mUvylHtKG8=.d3f13630-5005-48f1-974f-67a48d541afe@github.com> <5eG1Sr2mFsUTgbR7eKxeMVKHC1dOjmXpZV5khVDy6dU=.7473f409-7d2d-4236-9fc1-1779cb7a21e2@github.com> <J6SwffJKTrjlGnKmBjDW4Sb5rH-yCeIG_U2qJh82uEo=.0cdaf244-7365-4cf7-8d64-d9acdf40c602@github.com> <DH-QqVlH_CTpf03sF2EizUTJ6y7Wa9gkjCXRPlBr9ok=.454edca9-d054-4246-a209-92cb9ba55293@github.com> <jIqEWFxIWvvAwjXZq8cyrslaRu2A_cVqE7m0rlr5U_o=.8ccc83d8-08e0-4e9f-8200-100209cc3973@github.com> <cgGbLjgnx6pGTAiLEfzGEwf8XkXVZK8T76-NCobHmN8=.60b4a1ca-b5d4-4874-9ff1-eca1ed724ae3@github.com> <B_5cv6n6HpGp1qsXdgUrppL6y8kOV29MgDBxAYKDl_g=.4f8aaea0-56ff-4e3a-a36d-2251f0d7f7a4@github.com> <8YB9Syeua6CdAhTa_gRUS_ddWv7_nXDkip8VRzYGgFc=.93d169b8-0779-4b99-b88f-8d5de380b4fa@github.com> Message-ID: <QJkotQ3cMcAMB3c5pXFOZ8su91z7Dd141sAhXkC4jYo=.956fde87-9e95-4c55-8bfa-4c10242bb66f@github.com> On Fri, 18 Nov 2022 06:15:39 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> If there are no command line agents, then on startup `vthread_notify_jvmti_events` is not set true. Because it is not true, when `javaClasses_init()` calls `init_static_notify_jvmti_events()`, it does nothing. The whole point of the code we are reviewing here is to make sure `init_static_notify_jvmti_events()` is called while `vthread_notify_jvmti_events == true` so it actually does something. However, the code here does not bother calling `init_static_notify_jvmti_events()` if the current thread is detached, but it does still set `vthread_notify_jvmti_events = true`. This means that if this code gets called a second time, this time with the current thread attached, it will not call `init_static_notify_jvmti_events()` due to `vthread_notify_jvmti_events == true`, but it seems it should be calling it. >> >> What I believe to be the flaw here is that you call `set_notify_jvmti_events(true)` even if you don't call `init_static_notify_jvmti_events()`. > >> What I believe to be the flaw here is that you call `set_notify_jvmti_events(true)` >> even if you don't call `init_static_notify_jvmti_events()`. > > This only happen in a detached thread case which can be only at startup. > It was implemented this way before my fix. > The javaClasses has to be partially initialized before any call to `init_static_notify_jvmti_events()`: > > void javaClasses_init() { > JavaClasses::compute_offsets(); <== This is needed > JavaClasses::check_offsets(); > java_lang_VirtualThread::init_static_notify_jvmti_events(); > FilteredFieldsMap::initialize(); // must be done after computing offsets. > } > > > For detached thread case (which is at starup) it is guaranteed that the `init_static_notify_jvmti_events()` is unconditionally called later at initialization stage. Please, see the chain of calls: > `create_vm() => init_globals() => javaClasses_init() => java_lang_VirtualThread::init_static_notify_jvmti_events()`. > > There have to be no cases with a call to `set_notify_jvmti_events(true)` without a following call to `init_static_notify_jvmti_events()`: > - detached thread case: the `init_static_notify_jvmti_events()` is called later at startup sequence > - JavaThread case: the `init_static_notify_jvmti_events()` is always called explicitly after the call to `set_notify_jvmti_events(true)` > > You can think about the agents which do not call `GetEnv()` in`AgentOnLoad/AgentOnAttach` or have no `AgentOnLoad/AgentOnAttach` entry points. Such agents can later call `GetEnv()` from a detached thread. > Is it your concern? I've pushed an update. Please, let me know of you are okay with that. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From xuelei at openjdk.org Fri Nov 18 07:39:41 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 18 Nov 2022 07:39:41 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v10] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <FKYmQ_DJj6_OlBjP5iK_1QqVd-C2u7e5ASMGKJ4tL_U=.d16461c0-d6f5-462e-aafa-4bad3202b82b@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: fix size_t cast warning on windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/59a87dd1..4fa31622 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=08-09 Stats: 24 lines in 2 files changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Fri Nov 18 08:20:30 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 18 Nov 2022 08:20:30 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v11] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <xkHrwI89PL20KS1OWeYWYnZ5DXGd_3XN9fptfNUPVlc=.349ee0f0-20e3-41a1-95b2-1af4bdb3b3bf@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: more size_t updare for windows build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/4fa31622..c3da70cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=09-10 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From bkilambi at openjdk.org Fri Nov 18 10:21:32 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 18 Nov 2022 10:21:32 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v4] In-Reply-To: <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> Message-ID: <Jh-DLeaze8jGgsL6lu0rf9CmaDlkroBr3yziqRrGtyM=.6fe04da1-df48-4d89-9d3f-3081619880e2@github.com> On Mon, 14 Nov 2022 09:37:53 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Removed svesha3 feature check for eor3 @turbanoff Hello, I have made the changes you've suggested plus some more changes regarding the feature detection for svesha3 in the latest patch. Could you please take a look? Thank you in advance .. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From aturbanov at openjdk.org Fri Nov 18 10:32:21 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 18 Nov 2022 10:32:21 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v4] In-Reply-To: <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> Message-ID: <BIAOc0-bbU0HFgIkpSs7YFd443NLG2aC58Bucbghcx4=.348513e0-feaa-4904-8b6c-b01492b18d37@github.com> On Mon, 14 Nov 2022 09:37:53 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Removed svesha3 feature check for eor3 Marked as reviewed by aturbanov (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10407 From rkennke at openjdk.org Fri Nov 18 12:58:18 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 18 Nov 2022 12:58:18 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> Message-ID: <_5KqnEOjBYRbx4MWGQYCKTwmRN_eSpfiqi8i7mGL_1A=.82a7edb7-4d1f-4b09-8461-5ef09f64b27a@github.com> On Mon, 14 Nov 2022 07:32:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. >> >> We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David Looks good to me. I kinda like the OS interface. If the proliferation of lots of unimplemented methods in os/* is a concern, we could provide default impls in shared if !__GLIBC__ as a middle-ground. WDYT? ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11089 From tschatzl at openjdk.org Fri Nov 18 13:16:58 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 18 Nov 2022 13:16:58 GMT Subject: RFR: 8296954: G1: Enable parallel scanning for heap region remset In-Reply-To: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> References: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> Message-ID: <FKNKreS1s3_gDAnPWr-c2S2p3OkbCiyv7OsjBdY5D3w=.0ce776c5-bc81-4fa8-bdc0-deb9fe4731f1@github.com> On Tue, 15 Nov 2022 17:57:24 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote: > Hi all, > > Please review this change that allows parallel scanning of a heap region's remembered set. More balanced work load distribution in cases where are cards are unevenly distributed among remembered sets. > > Testing: Tier 1-3 > > Thanks Lgtm. src/hotspot/share/gc/g1/g1RemSet.cpp line 919: > 917: uint hrm_index = r->hrm_index(); > 918: > 919: // we call prepare_remset_for_scan unconditionally because optional evacuation doesn't not call this method again. Suggestion: I do not think there is need for a comment here. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/11173 From stuefe at openjdk.org Fri Nov 18 14:08:20 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 18 Nov 2022 14:08:20 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v2] In-Reply-To: <_5KqnEOjBYRbx4MWGQYCKTwmRN_eSpfiqi8i7mGL_1A=.82a7edb7-4d1f-4b09-8461-5ef09f64b27a@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <vBtNU4U04jMPutD1fcypaDk77fqS3pAbiVg7QHZa1i4=.624dd7f0-3500-4153-9c5f-56df44eef1f2@github.com> <_5KqnEOjBYRbx4MWGQYCKTwmRN_eSpfiqi8i7mGL_1A=.82a7edb7-4d1f-4b09-8461-5ef09f64b27a@github.com> Message-ID: <WW2CUIVVZIJ32oqltsjj1lfRZnlB40H83iGT3tuQRt8=.af49804a-c99e-4830-841f-19a09d8c5002@github.com> On Fri, 18 Nov 2022 12:54:34 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > Looks good to me. Thank you! > I kinda like the OS interface. If the proliferation of lots of unimplemented methods in os/* is a concern, we could provide default impls in shared if !**GLIBC** as a middle-ground. WDYT? I fear this would annoy people even more than platform defines. Ioi did use something with "HAVE_xxxx_xxxx" macros to provide default implementations for similar things, but I think in this case we are talking about so little code, its not worth the trouble. ------------- PR: https://git.openjdk.org/jdk/pull/11089 From jvernee at openjdk.org Fri Nov 18 14:54:52 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 18 Nov 2022 14:54:52 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v7] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <Nzi7QqlRzEE_tu6l1g54E7c5BAGIg9tSyaR4nk_fr8E=.df313bdb-84b6-4ff4-b3b5-22ed34a26b2c@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: 8296973: saving errno on a value-returning function crashes the JVM Reviewed-by: mcimadamore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/4d440443..0fa0e8cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=05-06 Stats: 197 lines in 9 files changed: 159 ins; 22 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Fri Nov 18 14:54:54 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 18 Nov 2022 14:54:54 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v6] In-Reply-To: <xJeSQMXY8k99ViKk6B1ceb6SIia3OCexyoHn-dLuUDY=.3d0d0ae6-64ea-4bb1-94cd-196d82cb8be4@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <xJeSQMXY8k99ViKk6B1ceb6SIia3OCexyoHn-dLuUDY=.3d0d0ae6-64ea-4bb1-94cd-196d82cb8be4@github.com> Message-ID: <Atk2MaWbVuNHKvEbhWoebTfD0ex2SHgwC4BvE0qKqLA=.97a30b5b-f301-4e2a-9efa-fd0da7dcd14a@github.com> On Wed, 16 Nov 2022 17:04:03 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > fix stubs I've added one more bug fix to this pull request, which fixes a crash when using `CaptureCallState`. See the original PR: https://github.com/openjdk/panama-foreign/pull/753 See also the commit for a concise diff of the changes: https://github.com/openjdk/jdk/pull/11019/commits/0fa0e8cff14bd1f4978f4b28c5c3ddceda302d20 ------------- PR: https://git.openjdk.org/jdk/pull/11019 From aph at openjdk.org Fri Nov 18 15:05:42 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 15:05:42 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v11] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <nGXhxc7fXpK6UayTXFSfaE406z1kfyQiS4SOVUBo2oU=.07531d82-960f-4fdb-b1b4-cb0bfdba683d@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/280cd6c5..d9e5979f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=09-10 Stats: 19 lines in 2 files changed: 8 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Fri Nov 18 15:23:27 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 18 Nov 2022 15:23:27 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v11] In-Reply-To: <nGXhxc7fXpK6UayTXFSfaE406z1kfyQiS4SOVUBo2oU=.07531d82-960f-4fdb-b1b4-cb0bfdba683d@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <nGXhxc7fXpK6UayTXFSfaE406z1kfyQiS4SOVUBo2oU=.07531d82-960f-4fdb-b1b4-cb0bfdba683d@github.com> Message-ID: <h3B01LlqVD3uiXhhGeln_rF6krBDWyUbw0oOCh69ZYU=.9d1c5d6e-4ef5-464e-a232-20c95d81b5d3@github.com> On Fri, 18 Nov 2022 15:05:42 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Reviewer feedback src/java.base/share/classes/java/lang/Thread.java line 787: > 785: > 786: // special value to mean a new thread > 787: this.scopedValueBindings = Thread.class; The addition of NEW_THREAD_BINDINGS means this one should change too. The update means the comment should probably be adjusted too, maybe "initial value for a new thread". ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Fri Nov 18 15:55:30 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 15:55:30 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v12] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <aEvqqAxZB1RHeCBva98kz_0Cl0-l6ZQnf6Kq8Z46PXg=.13c8fb76-c409-4ce5-b844-8e27029e95eb@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Reviewer feedback - Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/d9e5979f..22cdc1af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=10-11 Stats: 64 lines in 7 files changed: 30 ins; 19 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Fri Nov 18 15:55:30 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 15:55:30 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <3wlfTnsbrVrppFImhVNfweUmw7nm9L2az9SOlqiNCAk=.c3478ce5-b4ab-46ba-a0e9-c7667d9cf5b2@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> <EJkCeTy8dOSzkBwHfx4snp6BBcIER2htjrWTxwFIeOY=.9fc75771-6651-4602-b187-6006bceed663@github.com> <3wlfTnsbrVrppFImhVNfweUmw7nm9L2az9SOlqiNCAk=.c3478ce5-b4ab-46ba-a0e9-c7667d9cf5b2@github.com> Message-ID: <PLekEKqSXVm5Rlo6C3R1PBdqQbLHZeQYpokjQ1GhoQc=.eb94740c-7d9d-4232-bcba-eab00ef97897@github.com> On Thu, 17 Nov 2022 18:58:11 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/hotspot/share/utilities/exceptions.cpp line 166: >> >>> 164: // Remove the ScopedValue cache in case we got a virtual machine >>> 165: // Error while we were trying to manipulate ScopedValue bindings. >>> 166: thread->set_scopedValueCache(NULL); >> >> I am see this pattern repeat quite often: >> >> thread->set_scopedValueCache(NULL); >> oop threadObj = thread->vthread(); >> assert(threadOop != NULL, "must be"); // <--- sometimes >> java_lang_Thread::clear_scopedValueBindings(threadObj); >> >> Encapsulate in a method on the `JavaThread` class? > > That sounds good. Done. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From chagedorn at openjdk.org Fri Nov 18 16:01:44 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Nov 2022 16:01:44 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v5] In-Reply-To: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> Message-ID: <irqVZ-M8DFtPPlkymykuhvO4h-lCFDJ9MEApdDxq9Fs=.d641fd91-1bb3-4481-9f8b-5878507abdec@github.com> > The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: > > The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. > > Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. > > I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Update algorithm to print char by char, skipping file separators on the fly and only caring about the actual filename (ignore prefix path when reading) - Merge branch 'master' into JDK-8293422 - Always read full filename and strip prefix path and only then cut filename to fit output buffer - Merge branch 'master' into JDK-8293422 - Merge branch 'master' into JDK-8293422 - Review comments from Thomas - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ - 8293422: DWARF emitted by Clang cannot be parsed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10287/files - new: https://git.openjdk.org/jdk/pull/10287/files/24f624f8..0b759abe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=03-04 Stats: 265775 lines in 3415 files changed: 132953 ins; 91111 del; 41711 mod Patch: https://git.openjdk.org/jdk/pull/10287.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10287/head:pull/10287 PR: https://git.openjdk.org/jdk/pull/10287 From aph at openjdk.org Fri Nov 18 16:02:40 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 16:02:40 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v13] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <gp9qQrgqSx4OVenJARWiXwyAISEVJBKIJjXhGBXDQ94=.6d7b748b-3874-49e2-9022-443fec7c8d6b@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/22cdc1af..cac85ad0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From chagedorn at openjdk.org Fri Nov 18 16:01:44 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Nov 2022 16:01:44 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <ofIMTnRPIIqTzVGjaaaBedoelHOGrD3aKbEi79M7RT8=.655c1544-0c68-4c21-90e1-38c276bb731e@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <ZOj-Cb86tjWGXWJK8AQx4EB81bcerOGONb4P6vkPZkw=.0fb827e5-8c50-4174-a0fd-4f1142b8d4f2@github.com> <ofIMTnRPIIqTzVGjaaaBedoelHOGrD3aKbEi79M7RT8=.655c1544-0c68-4c21-90e1-38c276bb731e@github.com> Message-ID: <WiqK7d5QLhQawRryMdQDnEG7-HVJXOohPSzIWgEQtzg=.729714c8-170d-4e08-8e12-f15333a49114@github.com> On Wed, 16 Nov 2022 08:48:51 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> Yes, I think it would be good to get a second review of the DWARF parser changes. Maybe @tstuefe? > >> Yes, I think it would be good to get a second review of the DWARF parser changes. Maybe @tstuefe? > > I'm a bit swamped, but try to take a look later today. I've pushed an update and changed the algorithm in the following way to address @tstuefe review comments: - Just read and ignore the filenames from DWARF which do not correspond to the one we are looking for (i.e. when `current_index != file_index`). - Reading the filename of interest (i.e. when `current_index == file_index`): - Read single chars and stop once the null terminator is found. - Reset buffer when file separator is found to skip the prefix path. - On `filename` buffer overflow: Keep reading, we could still be reading a path prefix and reset the buffer again when finding a file separator. - If filename does not fit into the provided buffer, use a generic overflow message `<OVERFLOW>`. If that does not fit either, use the minimal filename `L`. This allows to at least have the source information `L:line_no` which almost always is already enough to get to the actual source code location. Doing it in this way lets the parser succeed instead of failing. I've added some additional tests for the overflow scenarios and enforced `get_source_info()` to only accept buffers with a length of at least 2 to always allow the minimal filename `L`. Submitted some testing again in our CI (results look good so far) and additionally by running gtests with Clang slowdebug, fastdebug and release builds. I've done some additional manual testing, both with GCC and Clang builds. I've also played around by changing the buffer size in `vmError.cpp`. It works as expected: - buffer size 15, emitting `<OVERFLOW>` for filenames being too long: V [libjvm.so+0x8905fc] CompileWrapper::~CompileWrapper()+0x56 (compile.cpp:492) V [libjvm.so+0x8921e6] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1652 (compile.cpp:864) V [libjvm.so+0x77d171] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x179 (c2compiler.cpp:113) V [libjvm.so+0x8b0fc4] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e8 (<OVERFLOW>:2237) V [libjvm.so+0x8afc3d] CompileBroker::compiler_thread_loop()+0x3ed (<OVERFLOW>:1916) V [libjvm.so+0x8d047c] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x72 (<OVERFLOW>:58) V [libjvm.so+0xc63342] JavaThread::thread_main_inner()+0x144 (javaThread.cpp:699) V [libjvm.so+0xc631fa] JavaThread::run()+0x182 (javaThread.cpp:684) V [libjvm.so+0x1337633] Thread::call_run()+0x195 (thread.cpp:224) V [libjvm.so+0x10e38d7] thread_native_entry(Thread*)+0x19b (os_linux.cpp:710) - buffer size 2, emitting `L` for all filenames (being too long): V [libjvm.so+0x8905fc] CompileWrapper::~CompileWrapper()+0x56 (compile.cpp:492) V [libjvm.so+0x8921e6] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1652 (L:864) V [libjvm.so+0x77d171] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x179 (L:113) V [libjvm.so+0x8b0fc4] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e8 (L:2237) V [libjvm.so+0x8afc3d] CompileBroker::compiler_thread_loop()+0x3ed (L:1916) V [libjvm.so+0x8d047c] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x72 (L:58) V [libjvm.so+0xc6339c] JavaThread::thread_main_inner()+0x144 (L:699) V [libjvm.so+0xc63254] JavaThread::run()+0x182 (L:684) V [libjvm.so+0x1337693] Thread::call_run()+0x195 (L:224) V [libjvm.so+0x10e3937] thread_native_entry(Thread*)+0x19b (L:710) ------------- PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Fri Nov 18 16:01:44 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Nov 2022 16:01:44 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: <S4pT9MhCorn5XvwppyTIKDMWmutvDzcZShwzGbvwGAA=.45cbb5d3-3ef0-4b9b-ba7c-0ad8ab7152b9@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <jx2kU4A0gxp4YCjuO5EP-nxww4wpHTc-vAZd9pJ__TE=.2b87d2e4-8fc8-4bbf-837c-0bac60362121@github.com> <3CD6ryhzVPUpiJTkeepDevEr96HfvCkLdCR2AzmxqoA=.db139ad6-cbb6-4791-a6d1-4d28e54fdcd1@github.com> <-LPuNwZylHWH7M8ifGeNw27j7T7-kQzbKa4y0H6cHvM=.8ea671b0-980d-4a81-94b6-1ddab48c1fea@github.com> <S4pT9MhCorn5XvwppyTIKDMWmutvDzcZShwzGbvwGAA=.45cbb5d3-3ef0-4b9b-ba7c-0ad8ab7152b9@github.com> Message-ID: <B9rcIFrFva7wHxMfTGxdPIXDmKx6JUmfqdCKzcI6bpc=.b2af887c-6b7d-473d-a951-91e3598ae511@github.com> On Wed, 16 Nov 2022 16:07:36 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> In theory, they don't need to be exposed in the sense of being declared in the header. But in terms of readability, I've decided to put them in the private block of class `LineNumberProgram` which is the only user of these methods. What would be the advantages of moving them completely to `elfFile.cpp` as static helper functions? > > Oh, I generally just prefer to keep things locally if possible. Slim interfaces, less polluted global namespace, possibly (though not here) less include deps. Don't worry, if you prefer it this way, keep it in. I understand. I don't have a strong preference here but to be consistent with the existing DWARF parser code, I'd rather keep it like that. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Fri Nov 18 16:06:38 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Nov 2022 16:06:38 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v6] In-Reply-To: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> Message-ID: <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> > The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: > > The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. > > Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. > > I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Remove unused local variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10287/files - new: https://git.openjdk.org/jdk/pull/10287/files/0b759abe..3121d380 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10287.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10287/head:pull/10287 PR: https://git.openjdk.org/jdk/pull/10287 From stuefe at openjdk.org Fri Nov 18 16:13:43 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 18 Nov 2022 16:13:43 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v6] In-Reply-To: <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> Message-ID: <XfZSQ9aBPncxtqH_gXC2plIIxMjBq3huEbB4bmBDnCc=.635a6f46-e41b-4ff0-9ef1-5c3a0a6986e1@github.com> On Fri, 18 Nov 2022 16:06:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused local variable Looks good. If tests are green, fine for me. Thanks! ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/10287 From coleenp at openjdk.org Fri Nov 18 16:47:18 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 18 Nov 2022 16:47:18 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <XDYYIWQnMlxs_8QWox0wJ0FuEQMtXDl9-NMRFtoGiTc=.25f6b442-1569-4867-8b2b-f2609d47ce0d@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file CSR is approved, please review. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From aph at openjdk.org Fri Nov 18 16:47:37 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 16:47:37 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v14] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <4mNldDTjTMEAERukS9Hja6bvLGUGjiTAkGCpp2rdv3g=.b2e85e6b-6eff-4820-bb84-0a3998005c31@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/cac85ad0..6de9a4cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=12-13 Stats: 27 lines in 2 files changed: 9 ins; 13 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Fri Nov 18 16:47:38 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 16:47:38 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v14] In-Reply-To: <6PNLRgFIjkvIRL_tIW0btKxdEapZI6_JC8roNRFBSws=.a6292551-ac7b-4f45-a851-3c4c614edc3b@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <FclwJZRKpjSGCNnl3yRZRRzmvJuQlXNHFmdcQ742p18=.f9ed06dd-4602-400d-aa91-2091b9f26982@github.com> <40rwB5m6Mskkevkwkj8B34o540txfesN7P-pOGWPfqA=.4cf0adb3-1e3d-4a87-b2bf-505c7f15d487@github.com> <6PNLRgFIjkvIRL_tIW0btKxdEapZI6_JC8roNRFBSws=.a6292551-ac7b-4f45-a851-3c4c614edc3b@github.com> Message-ID: <c1IuAwz3d9KBYXaUKXaZVUwdANTMZiDGaJX4bxhNOLY=.6c1e2d43-f8f8-45bf-b731-76c2727d71e9@github.com> On Wed, 16 Nov 2022 19:04:29 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Probably so, yes. I'll have a look at that along with caching failure. > > So I just did the experiment of caching failures and the result of `isBound()`. > > This test: > > > @Benchmark > @OutputTimeUnit(TimeUnit.NANOSECONDS) > public int thousandMaybeGets(Blackhole bh) throws Exception { > int result = 0; > for (int i = 0; i < 1_000; i++) { > if (ScopedValuesData.sl1.isBound()) { > result += ScopedValuesData.sl1.get(); > } > } > return result; > } > > > > Before and after: > > > ScopedValues.thousandMaybeGets avgt 10 13436.112 ? 20.885 ns/op > ScopedValues.thousandMaybeGets avgt 10 56.315 ? 0.583 ns/op > > > You may have a point. The experiment is on a branch called `JDK-8286666-cache-queries` in [My personal repo](https://github.com/theRealAph/jdk). > > I'd push it now but it's getting a bit late to make such changes now. WDYT? Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Fri Nov 18 17:05:32 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 17:05:32 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v10] In-Reply-To: <PbLj4HZspA7Ti-YypZe3kivzksHh4Jjewfk0r0xclUU=.0f2dc4f8-adca-47fa-9c2c-eb80029890d2@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <E7HwXXjC4P7B2RXor6pCyw9g8ZaW63bkvF_vZBw7Obs=.bf6400de-54cc-429f-a898-d27328e542f6@github.com> <PbLj4HZspA7Ti-YypZe3kivzksHh4Jjewfk0r0xclUU=.0f2dc4f8-adca-47fa-9c2c-eb80029890d2@github.com> Message-ID: <hPR0FzVfDlfqlficxPXRYGAoQF6muLfxbtju-V3MGgM=.29f8c0ae-c77d-4876-a8f9-8c6171932495@github.com> On Thu, 17 Nov 2022 20:18:42 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Javadoc changes. >> - ProblemList.txt cleanup > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 824: > >> 822: } >> 823: >> 824: public static void invalidate() { > > Is this method used? No. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Fri Nov 18 17:13:25 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 18 Nov 2022 17:13:25 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <KO8y6nIjuVJZGYsqPG0tz0q20rgLxaXb3n_cCb4Nn9c=.a0501b05-affe-4a93-9a90-bd1fdf976265@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11023 From aph at openjdk.org Fri Nov 18 17:19:24 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 17:19:24 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v15] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <ijL-Up8ZxxnWvQkYsbqxxt9_BvPfaszJ1FqzERpJtAE=.d91b0a69-af58-46e5-8eff-7d8f3f8b700c@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Reviewer feedback - Reviewer feedback Javadoc fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/6de9a4cc..17c458fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=13-14 Stats: 20 lines in 2 files changed: 3 ins; 4 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Fri Nov 18 17:34:28 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 18 Nov 2022 17:34:28 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v15] In-Reply-To: <ijL-Up8ZxxnWvQkYsbqxxt9_BvPfaszJ1FqzERpJtAE=.d91b0a69-af58-46e5-8eff-7d8f3f8b700c@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <ijL-Up8ZxxnWvQkYsbqxxt9_BvPfaszJ1FqzERpJtAE=.d91b0a69-af58-46e5-8eff-7d8f3f8b700c@github.com> Message-ID: <RPmfJsR3Kh1mGzc7Nd8ybgSnj_0eDiL2OeEECjF2puY=.0d5323fe-01c0-4c74-acb1-fb39cdbd69a9@github.com> On Fri, 18 Nov 2022 17:19:24 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Reviewer feedback > - Reviewer feedback Javadoc fixes src/java.base/share/classes/java/lang/Thread.java line 789: > 787: > 788: // special value to mean a new thread > 789: this.scopedValueBindings = NEW_THREAD_BINDINGS; Can we change the comment on this one to be the same as the other constructor? src/java.base/share/classes/java/lang/Thread.java line 1622: > 1620: // The VM recognizes this method as special, so any changes to the > 1621: // name or signature require corresponding changes in > 1622: // JVM_FindScopedValueBindings(). Minor nit but I'd prefer to keep things consistent with the existing code/style where we can. In this case, we can move the comment to move the annotations with the /* .. */ comments so it's the same as the other methods. src/java.base/share/classes/java/lang/VirtualThread.java line 316: > 314: } > 315: } > 316: @Hidden Missing line break. Suggestion: @Hidden ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Fri Nov 18 18:38:53 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Nov 2022 18:38:53 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v16] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <dWLr0Caum9r-yZdQhK2U0vhdNlpBIlRVyWckx80r6CU=.d60ef922-713f-473d-b468-17569b5c631c@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Add ensureMaterializedForStackWalk kludge for AArch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/17c458fa..86ce5bbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=14-15 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From xuelei at openjdk.org Fri Nov 18 19:25:32 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 18 Nov 2022 19:25:32 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v12] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: extra sizeof typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/c3da70cc..4f80245f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From cjplummer at openjdk.org Fri Nov 18 19:37:20 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 18 Nov 2022 19:37:20 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v3] In-Reply-To: <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> Message-ID: <0VjqZd6vfDwmSlMEaWavZpccxh_TmCzoG16zKo7H70U=.e94823c7-88e3-4d32-8661-b4f1890da534@github.com> On Fri, 18 Nov 2022 07:23:20 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. >> This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. >> >> Testing: >> New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` >> This test is failed without fix and passed with it. >> TBD: run all JVMTI and JDI test in mach5. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > ajust condition when init_static_notify_jvmti_events() is called Marked as reviewed by cjplummer (Reviewer). src/hotspot/share/prims/jvmtiExport.cpp line 390: > 388: java_lang_VirtualThread::init_static_notify_jvmti_events(); > 389: } > 390: } Yes, this looks good now. Removing the `if (!java_lang_VirtualThread ::notify_jvmti_events())` check means the `init_static_notify_jvmti_events()` can still be called on subsequent calls to this method. So if `init_static_notify_jvmti_events()` was not called the first time (due to not being in the LIVE phase), then it can still be called on subsequent calls to this method if not in the LIVE phase. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Fri Nov 18 19:50:09 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 19:50:09 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v3] In-Reply-To: <0VjqZd6vfDwmSlMEaWavZpccxh_TmCzoG16zKo7H70U=.e94823c7-88e3-4d32-8661-b4f1890da534@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> <0VjqZd6vfDwmSlMEaWavZpccxh_TmCzoG16zKo7H70U=.e94823c7-88e3-4d32-8661-b4f1890da534@github.com> Message-ID: <7SvTtVqmDQOgd2Enr78L4wOkXIza1hjnkC9ssfLonRU=.f687d0fa-e6ed-4965-a49c-355bbc1f4ae7@github.com> On Fri, 18 Nov 2022 19:34:19 GMT, Chris Plummer <cjplummer at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> ajust condition when init_static_notify_jvmti_events() is called > > src/hotspot/share/prims/jvmtiExport.cpp line 390: > >> 388: java_lang_VirtualThread::init_static_notify_jvmti_events(); >> 389: } >> 390: } > > Yes, this looks good now. Removing the `if (!java_lang_VirtualThread ::notify_jvmti_events())` check means the `init_static_notify_jvmti_events()` can still be called on subsequent calls to this method. So if `init_static_notify_jvmti_events()` was not called the first time (due to not being in the LIVE phase), then it can still be called on subsequent calls to this method if not in the LIVE phase. Okay. Thank you for the review, Chris, ------------- PR: https://git.openjdk.org/jdk/pull/11204 From lmesnik at openjdk.org Fri Nov 18 19:57:07 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 18 Nov 2022 19:57:07 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v3] In-Reply-To: <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> Message-ID: <i80mxihM20i93lyoAKlw6wvuS5B5jLbSK-QLSlKx7mU=.c0eff9e4-2b37-44ee-bd04-223c8277d30d@github.com> On Fri, 18 Nov 2022 07:23:20 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. >> This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. >> >> Testing: >> New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` >> This test is failed without fix and passed with it. >> TBD: run all JVMTI and JDI test in mach5. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > ajust condition when init_static_notify_jvmti_events() is called Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11204 From duke at openjdk.org Fri Nov 18 20:01:11 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 18 Nov 2022 20:01:11 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> Message-ID: <LuNOOGWIHbiT2xxntOB31ASuW4XV2nMCXDiTabwv3pA=.4a06eab1-08b5-4943-baf2-29ea19c55fc9@github.com> On Mon, 7 Nov 2022 21:19:50 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> @iklam thanks for sharing the information and details on the future work in this space. >> >>> By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail. >> >> I was actually referring to CA0 and CA1 in my figures (which I realized was not clear in my explanation earlier). >> Anyway, I now understand the existing mechanism works fine because the following conditions are maintained (which you have already mentioned in your comment): >> 1. G1 regions are at least 1MB, and are always a power of 2. >> 2. At dump time the objects are placed such that they do not cross `HeapRegion::min_region_size_in_words()` which I believe is 1M. >> >> Because of these two constraints, change in G1 region size at run time cannot result in objects crossing the region boundary. >> So if I update the G1 code such that at run time the regions are mapped at 1M boundary then I can get rid of the problem of objects crossing region boundary and the two tests also pass. >> >>> In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors. >> >> I agree ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) would make it easier to map them at run time and would be happy to contribute to it anyway possible. But again, that's a GC policy specific implementation detail. >> I guess you would agree we need to de-couple the CDS code from the GC policy details. While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. >> Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). >> So, with this in mind do you think we can continue working on this PR, or do you believe the GC APIs this PR proposes to add would not be sufficient once JDK-8296344 is implemented? >> >>> (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs). >> >> That makes sense. Ideally I should have done the implementation for other collectors in a separate RFEs. But I was worried if I the new APIs are flexible enough to support other non-G1 policies, and in an attempt to verify that I added the support for those policies as well. If it helps I can remove those commits and deliver them later in subsequent RFEs. > >> While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). > > I think it depends on how we want to change the dump time operations. If we decide to go with a single contiguous block, then the API for mapping this block into the runtime heap will look very different than what you have today: > > > bool ArchiveHeapLoader::get_heap_range_for_archive_regions(ArchiveHeapRegions* heap_regions, bool is_open) { > if (Universe::heap()->alloc_archive_regions(heap_regions->dumptime_regions(), > heap_regions->num_regions(), > heap_regions->runtime_regions(), > is_open)) { > > > Also, we should probably record the region boundary information in the archived objects. Something like "objects never span across 1MB boundaries". This may need to be passed to the runtime mapping API, so incompatible collectors (i.e., one uses 512KB regions) can reject the archived objects. > > One of my goal for JDK-8296344 is to optimize the archived objects for the collector chosen at dump time. For example, if you dump with SerialGC, the archived objects can be mapped efficiently without relocation when SerialGC is also chosen at run time, but may require relocation if G1 is chosen at run time. I am not sure if how this would affect the runtime mapping API. Maybe some sort of preference would need to be indicated. > > I think it would be best for us to think about the whole picture before committing to a design. Timing wise, I think we missed the JDK 20 release anyway, so we should have plenty time to come up with a good design for JDK 21. > > I also would like to hear from folks in our GC team. @tschatzl @stefank @iklam you mentioned [here](https://github.com/openjdk/jdk/pull/10970#issuecomment-1302600242) that you were not able to run the tests with -agentvm option using fastdebug build. Its a long shot but I am wondering if you still remember what error you were getting and which tests you were running. I recently tried running in agentvm mode with this PR and tests ran fine. The command I used was: `java -jar <path to jtreg.jar> -agentvm -verbose:all -jdk:<path to jdk to test> test/hotspot/jtreg:hotspot_cds` Some tests (like `runtime/cds/appcds/jvmti/ClassFileLoadHookTest.java`) had this error `Use -nativepath to specify the location of native code` but I see the error without this patch as well. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From sspitsyn at openjdk.org Fri Nov 18 20:08:23 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 20:08:23 GMT Subject: RFR: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM [v3] In-Reply-To: <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> <6NeJClVM3zGEb81DU6Dr7kQfABqziD-0aUTZ6QB-ryU=.b701a9d0-cb3f-40d6-a03d-c506f866f2c3@github.com> Message-ID: <pFLVcI20wFhhuzAmIctq-dPWHMyZiVUeMU9xgifJMMw=.b6cdd16d-850e-4dde-8f7d-680f5808a981@github.com> On Fri, 18 Nov 2022 07:23:20 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. >> This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. >> >> Testing: >> New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` >> This test is failed without fix and passed with it. >> TBD: run all JVMTI and JDI test in mach5. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > ajust condition when init_static_notify_jvmti_events() is called Thank you for review, Leonid. ------------- PR: https://git.openjdk.org/jdk/pull/11204 From sspitsyn at openjdk.org Fri Nov 18 20:55:37 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 18 Nov 2022 20:55:37 GMT Subject: Integrated: 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM In-Reply-To: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> References: <nP5hV1sSLHFBnOo9dXHanCPvUAOEUtj1iQJjSx2eBHU=.56a55871-83b8-4f3f-9b46-19d934cd0099@github.com> Message-ID: <tOok90y9KWNImL7-TYPoSYvq-x-_gwUdAKMOlNcW57M=.82a00c00-88db-4ea8-b124-b2b2436855be@github.com> On Thu, 17 Nov 2022 09:12:07 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > The `VirtualThread` static field `notifyJvmtiEvents` is not set correctly in cases JVMTI agents are loaded into running VM. It is because an extra call to java_lang_VirtualThread::init_static_notify_jvmti_events() is needed. > This function is called once at the VM initialization, so this extra call is not necessary for agent loaded at startup. > > Testing: > New test is added: `test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualStackTraceTest` > This test is failed without fix and passed with it. > TBD: run all JVMTI and JDI test in mach5. This pull request has now been integrated. Changeset: 035eaeec Author: Serguei Spitsyn <sspitsyn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/035eaeecabd484d6db629c8b4056fa4b3a73f960 Stats: 180 lines in 3 files changed: 180 ins; 0 del; 0 mod 8296324: JVMTI GetStackTrace truncates vthread stack trace for agents loaded into running VM Reviewed-by: cjplummer, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/11204 From stuefe at openjdk.org Sat Nov 19 06:48:24 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 19 Nov 2022 06:48:24 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v3] In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <n4UYS7KXQx6Fmc4Cz3-G0yd6Sd7c1_CdaeXLLOMvqhA=.5be36faa-2947-42d8-9ce3-79a908d1b5af@github.com> > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8296796-factor-out-os-trim-native-heap - Feedback David - JDK-8296796-factor-out-os-trim-native-heap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11089/files - new: https://git.openjdk.org/jdk/pull/11089/files/1a642e0d..33a54c65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11089&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11089&range=01-02 Stats: 29294 lines in 703 files changed: 11866 ins; 14633 del; 2795 mod Patch: https://git.openjdk.org/jdk/pull/11089.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11089/head:pull/11089 PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Sat Nov 19 07:01:26 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 19 Nov 2022 07:01:26 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v3] In-Reply-To: <n4UYS7KXQx6Fmc4Cz3-G0yd6Sd7c1_CdaeXLLOMvqhA=.5be36faa-2947-42d8-9ce3-79a908d1b5af@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <n4UYS7KXQx6Fmc4Cz3-G0yd6Sd7c1_CdaeXLLOMvqhA=.5be36faa-2947-42d8-9ce3-79a908d1b5af@github.com> Message-ID: <7MaaraKnQQG7ESYiRQIlB1stdcoSY-JYaq7TXPLrcKw=.503caf26-8529-4655-ac16-20e9a2676d39@github.com> On Sat, 19 Nov 2022 06:48:24 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. >> >> We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8296796-factor-out-os-trim-native-heap > - Feedback David > - JDK-8296796-factor-out-os-trim-native-heap Okay, I manually tested fastdebug on Alpine and on 32-bit. Also re-merged. If all tests run through with no attributable errors, I'll push. Thanks @dholmes-ora @rkennke for reviewing! ------------- PR: https://git.openjdk.org/jdk/pull/11089 From sspitsyn at openjdk.org Sat Nov 19 07:16:29 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 19 Nov 2022 07:16:29 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM Message-ID: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> The can_support_virtual_thread was initially implemented as an onload capability. It is why this capability does not work for the agents loaded into running VM. The fix is to move it from `onload` to `always`capabilities list. Testing: New test is added: VirtualStartThreadTest. TBD: mach5 jvmti, jdi and tier1-6 tests. ------------- Commit messages: - simplified VirtualThreadStartTest - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM Changes: https://git.openjdk.org/jdk/pull/11246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296323 Stats: 157 lines in 4 files changed: 153 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Sat Nov 19 07:19:16 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 19 Nov 2022 07:19:16 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v2] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <r7j5oedNi9yt93SmWwb3fN9NXgy8GCh-6bG74oMiiUk=.d8fe95a6-da46-43d0-b18a-b1a1474f3d18@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: roll back unintended VirtualThread.java file update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/74d67205..71ed522f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From alanb at openjdk.org Sat Nov 19 08:18:40 2022 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 19 Nov 2022 08:18:40 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v2] In-Reply-To: <r7j5oedNi9yt93SmWwb3fN9NXgy8GCh-6bG74oMiiUk=.d8fe95a6-da46-43d0-b18a-b1a1474f3d18@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <r7j5oedNi9yt93SmWwb3fN9NXgy8GCh-6bG74oMiiUk=.d8fe95a6-da46-43d0-b18a-b1a1474f3d18@github.com> Message-ID: <kOmKNYflm1ZpNnNILLmPGCBeom9Ww_1fnaKPBpSckeo=.166b4d6a-7879-4a05-81a4-94e0124f1b3f@github.com> On Sat, 19 Nov 2022 07:19:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > roll back unintended VirtualThread.java file update The update to allow can_support_virtual_threads in the live phase looks okay, I'm just surprised we missed that in JDK 19. The test looks okay too but it's not a complete unit test for the VirtualThreadEvent event. It only tests the late binding agent case (which is what this bug is about). Maybe it should be extended to run with -agentlib so that it starts in the onload phase or maybe rename so that it's clear what the test is for. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Sat Nov 19 08:58:19 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 19 Nov 2022 08:58:19 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v2] In-Reply-To: <r7j5oedNi9yt93SmWwb3fN9NXgy8GCh-6bG74oMiiUk=.d8fe95a6-da46-43d0-b18a-b1a1474f3d18@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <r7j5oedNi9yt93SmWwb3fN9NXgy8GCh-6bG74oMiiUk=.d8fe95a6-da46-43d0-b18a-b1a1474f3d18@github.com> Message-ID: <CZR-axmyTDBrzx0OfxEYX-gyuBfoOLyDEitEXdcz9xs=.76002467-f626-4ef5-88b9-a4adcea70409@github.com> On Sat, 19 Nov 2022 07:19:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > roll back unintended VirtualThread.java file update Thank you for looking at it, Alan. These late binding agent related issues were surprising for me too. We missed to add the relevant test coverage. I'll update the test to add the onload execution mode. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From duke at openjdk.org Sat Nov 19 10:59:49 2022 From: duke at openjdk.org (ExE Boss) Date: Sat, 19 Nov 2022 10:59:49 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v16] In-Reply-To: <dWLr0Caum9r-yZdQhK2U0vhdNlpBIlRVyWckx80r6CU=.d60ef922-713f-473d-b468-17569b5c631c@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <dWLr0Caum9r-yZdQhK2U0vhdNlpBIlRVyWckx80r6CU=.d60ef922-713f-473d-b468-17569b5c631c@github.com> Message-ID: <qVP3IrEDJvOq6n5Xn2wq5dlK0HQb3D9jj9KSQn5pRRM=.007f9cb3-f275-4b0e-85bb-b87f55ef5e20@github.com> On Fri, 18 Nov 2022 18:38:53 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Add ensureMaterializedForStackWalk kludge for AArch64 src/hotspot/cpu/aarch64/aarch64.ad line 3637: > 3635: __ nop(); > 3636: __ block_comment("call JVM_EnsureMaterializedForStackWalk (elided)"); > 3637: } else { This?should probably?have its?indentation?fixed: Suggestion: } else { src/java.base/share/classes/java/lang/Thread.java line 1622: > 1620: // The VM recognizes this method as special, so any changes to the > 1621: // name or signature require corresponding changes in > 1622: // JVM_FindScopedValueBindings(). Suggestion: /** * The VM recognizes this method as special, so any changes to the * name or signature require corresponding changes in * JVM_FindScopedValueBindings(). */ @Hidden @ForceInline ------------- PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Sat Nov 19 10:59:51 2022 From: duke at openjdk.org (ExE Boss) Date: Sat, 19 Nov 2022 10:59:51 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v15] In-Reply-To: <RPmfJsR3Kh1mGzc7Nd8ybgSnj_0eDiL2OeEECjF2puY=.0d5323fe-01c0-4c74-acb1-fb39cdbd69a9@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <ijL-Up8ZxxnWvQkYsbqxxt9_BvPfaszJ1FqzERpJtAE=.d91b0a69-af58-46e5-8eff-7d8f3f8b700c@github.com> <RPmfJsR3Kh1mGzc7Nd8ybgSnj_0eDiL2OeEECjF2puY=.0d5323fe-01c0-4c74-acb1-fb39cdbd69a9@github.com> Message-ID: <7OGkkIGaoN-F14YPRCl-WzugX_ZThVr-HThTDCqj5Ic=.247194ee-a343-4634-afc4-6e021b56e6d0@github.com> On Fri, 18 Nov 2022 17:27:28 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Reviewer feedback >> - Reviewer feedback Javadoc fixes > > src/java.base/share/classes/java/lang/VirtualThread.java line 316: > >> 314: } >> 315: } >> 316: @Hidden > > Missing line break. > Suggestion: > > > @Hidden Also?include the?comment: Suggestion: /** * The VM recognizes this method as special, so any changes to the * name or signature require corresponding changes in * JVM_FindScopedValueBindings(). */ @Hidden ------------- PR: https://git.openjdk.org/jdk/pull/10952 From stuefe at openjdk.org Sat Nov 19 11:55:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 19 Nov 2022 11:55:28 GMT Subject: Integrated: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming In-Reply-To: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> Message-ID: <0HizKQ0twgJ1Yq1h1Ha7CD2WUFrW1amaG_dnY6GeXAY=.b0782cb9-c2e1-4e66-b065-af950dce02e6@github.com> On Thu, 10 Nov 2022 13:23:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. > > We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. This pull request has now been integrated. Changeset: 0845b39c Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/0845b39caf6f04dca9cb7a5852f05b4b5ffbc034 Stats: 141 lines in 10 files changed: 89 ins; 34 del; 18 mod 8296796: Provide clean, platform-agnostic interface to C-heap trimming Reviewed-by: dholmes, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/11089 From stuefe at openjdk.org Sat Nov 19 15:02:34 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 19 Nov 2022 15:02:34 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v5] In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <eoruHJaZINCm__X4z3Sg7dBegZMl53ewdnKNLTfw0tw=.07210e3f-79e5-440c-9ef2-06bde7163457@github.com> > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors - Feedback Axel - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors - Update test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java remove blank Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> - Feedback David - JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11118/files - new: https://git.openjdk.org/jdk/pull/11118/files/06a9bd58..984de277 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=03-04 Stats: 29457 lines in 715 files changed: 11957 ins; 14676 del; 2824 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From stuefe at openjdk.org Sun Nov 20 07:51:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 20 Nov 2022 07:51:07 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v5] In-Reply-To: <eoruHJaZINCm__X4z3Sg7dBegZMl53ewdnKNLTfw0tw=.07210e3f-79e5-440c-9ef2-06bde7163457@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <eoruHJaZINCm__X4z3Sg7dBegZMl53ewdnKNLTfw0tw=.07210e3f-79e5-440c-9ef2-06bde7163457@github.com> Message-ID: <ad8rA--eB3YFQ5weeZryofJJ0SQOL3J1Sap0PfU3ifM=.dc6fa7dc-a1cd-43e2-975a-cfb705afa40a@github.com> On Sat, 19 Nov 2022 15:02:34 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors > - Feedback Axel > - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors > - Update test/hotspot/jtreg/runtime/ErrorHandling/SecondaryErrorTest.java > > remove blank > > Co-authored-by: Andrey Turbanov <turbanoff at gmail.com> > - Feedback David > - JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors While playing with this, I thought this would be a lot better with - limiting printed stack size for secondary errors, either up to the before-last level of VMError::report invocation, or simply just n frames. Otherwise with subsequent crashes stacks get silly. - Maybe enabling the switch by default in debug builds? And in addition limit number of secondary crash callstacks to the first five. Subsequent ideas wrt recursive error limit: I think we could probably expand it. ------------- PR: https://git.openjdk.org/jdk/pull/11118 From sspitsyn at openjdk.org Sun Nov 20 08:42:33 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 20 Nov 2022 08:42:33 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v3] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <ijLvf9lYB8CQ82BZ-VqrBaRSOgt6kQWP73RadMwSXzc=.e8fd59fe-29a2-43cb-804d-177981daa3f7@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/71ed522f..681e6927 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=01-02 Stats: 87 lines in 3 files changed: 66 ins; 9 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Sun Nov 20 08:48:16 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 20 Nov 2022 08:48:16 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: fixed a trailing white space issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/681e6927..8e408555 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Sun Nov 20 08:57:19 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 20 Nov 2022 08:57:19 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> Message-ID: <dC6q9bNvUqayvKjGi2MJk2QrtuuH5CkprslFHVDmq3E=.bb3ff667-5c6d-4251-9de3-7e27bfdc64b2@github.com> On Sun, 20 Nov 2022 08:48:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed a trailing white space issue I've pushed an update to extend the test to cover more configurations: - agent loaded at startup and into running VM - with and without enabling JVMTI `can_support_virtual_classes` capability One problem was discovered and fixed in the `jvmtiExport.cpp`. The `ThreadStart` events that are posted on virtual thread when can_support_virtual_classes disabled provided carrier instead of virtual thread argument. The cause of it was that the `jvmtiThreadState` can be not set at the time the `ThreadStart` events are posted. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From dholmes at openjdk.org Mon Nov 21 01:50:20 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 21 Nov 2022 01:50:20 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <5OuxSKr8ZPZuCE9SOk2yCEaSO7nJ96pz1yx-yFbHixQ=.c1dfd974-8b9a-475a-b04a-47eac0c82dd9@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11023 From njian at openjdk.org Mon Nov 21 02:08:38 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Mon, 21 Nov 2022 02:08:38 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v4] In-Reply-To: <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> Message-ID: <CElL8NhIAUkSOPXvTnfeFqyNyVUbLa03r14Mm0T2Z5k=.2a3056c6-709b-42b9-8546-965584276f43@github.com> On Mon, 14 Nov 2022 09:37:53 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Removed svesha3 feature check for eor3 Marked as reviewed by njian (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10407 From eliu at openjdk.org Mon Nov 21 02:08:38 2022 From: eliu at openjdk.org (Eric Liu) Date: Mon, 21 Nov 2022 02:08:38 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v4] In-Reply-To: <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <j-0jvuuIm4sNxJlTzPFuzTI9qJISfkvZexu9bbPm3oU=.77e4e8f2-ab1c-41b8-9a2f-96911a82aedd@github.com> Message-ID: <3Jp_EvePVgJJqhiwIh5_U2E2alw4sJf72ArNuWnQr90=.09a499da-8ad9-4176-a679-44e410a5efa9@github.com> On Mon, 14 Nov 2022 09:37:53 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Removed svesha3 feature check for eor3 Marked as reviewed by eliu (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10407 From jwaters at openjdk.org Mon Nov 21 02:45:19 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 21 Nov 2022 02:45:19 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <XXVqN4ByCrB34JRZSgiNYWsdrwEOTKjo5u81sTFG5bE=.7748c17a-4a13-42aa-b10d-219fe6775da2@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> <XXVqN4ByCrB34JRZSgiNYWsdrwEOTKjo5u81sTFG5bE=.7748c17a-4a13-42aa-b10d-219fe6775da2@github.com> Message-ID: <nVCoCE9doWoQ54qhPSbn1xM79YLzhggfELps4CpI53o=.3af1e9bf-5389-439e-bd78-1b4fce336c2f@github.com> On Tue, 15 Nov 2022 06:33:00 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to using simpler solution similar to the original 8274980 > > src/hotspot/share/utilities/globalDefinitions.hpp line 50: > >> 48: >> 49: #ifndef ATTRIBUTE_ALIGNED >> 50: #define ATTRIBUTE_ALIGNED(x) alignas(x) > > HotSpot Group has not discussed or approved use of `alignas` - see https://bugs.openjdk.org/browse/JDK-8250269. This is another change that is independent of most of the rest of this PR, and should be dealt with separately. The various MSVC-conditional direct uses of `_declspec(align(N))` should probably currently be using `ATTRIBUTE_ALIGNED`. Out of curiosity, is there a way to get the discussion on approving the use of alignas back up? I've read through 8250269 briefly and unlike the issues that come with C++ attributes, alignas looks relatively straightforward to switch to, without much effect on existing code. Seems like a bit of a waste to leave the JBS entry sitting on the shelf to me > The various MSVC-conditional direct uses of __declspec(align(N)) should probably currently be using ATTRIBUTE_ALIGNED. The instances of `__declspec(align())` changed here are in the native libraries written in C, not within HotSpot itself. From what I can see at least HotSpot never uses compiler alignment attributes directly and always strictly sticks to `ATTRIBUTE_ALIGNED` (which is probably a good thing) ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Mon Nov 21 02:49:31 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 21 Nov 2022 02:49:31 GMT Subject: RFR: JDK-8296796: Provide clean, platform-agnostic interface to C-heap trimming [v3] In-Reply-To: <n4UYS7KXQx6Fmc4Cz3-G0yd6Sd7c1_CdaeXLLOMvqhA=.5be36faa-2947-42d8-9ce3-79a908d1b5af@github.com> References: <7-6YEY44bfzNqDeZhSaCHe0I_66CnTyaXC3TnZyRel0=.3cc1f6ec-fdea-466b-b576-ee9132989fb3@github.com> <n4UYS7KXQx6Fmc4Cz3-G0yd6Sd7c1_CdaeXLLOMvqhA=.5be36faa-2947-42d8-9ce3-79a908d1b5af@github.com> Message-ID: <fCcYStoy_wbK-Xc7CdIUPBJ-Ps_KRzGYLSVLjCJ9s0g=.b684da40-ae9b-4ffe-b132-b5fb4e3d4ffd@github.com> On Sat, 19 Nov 2022 06:48:24 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This is a breakout from [JDK-8293114](https://bugs.openjdk.org/browse/JDK-8293114), which is starved for reviews. So I attempt to break up that fix into smaller units which are hopefully easier to review separately. >> >> We can trim the C-heap manually using jcmd since [JDK-8268893](https://bugs.openjdk.org/browse/JDK-8268893). This patch reshapes this code, cleaning it up in an OS-agnostic way. That will allow us to add implementions for other platforms (I have this on my list for AIX at least) and make review of 8293114 easier. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8296796-factor-out-os-trim-native-heap > - Feedback David > - JDK-8296796-factor-out-os-trim-native-heap We had a strange quirk in our CI testing after this was pushed and it made me think a little differently about the change. If `rss_change` is only needed for the test then we should perhaps make it (and the test) debug only? Also the `rss_change` logic seems not to be thread-safe, so only useful for that single-threaded test. ------------- PR: https://git.openjdk.org/jdk/pull/11089 From aboldtch at openjdk.org Mon Nov 21 05:48:49 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 21 Nov 2022 05:48:49 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v2] In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <g17w1wVjzNMY24qTylC7Ymgfi1MzM-J0cV_HAlBuH2s=.cfe50eb7-f157-4493-97cc-f729f1fb6eda@github.com> > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Add reentrant reentry limits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11017/files - new: https://git.openjdk.org/jdk/pull/11017/files/3b0a453a..1cba0583 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=00-01 Stats: 155 lines in 3 files changed: 142 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Mon Nov 21 06:01:46 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 21 Nov 2022 06:01:46 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v3] In-Reply-To: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> Message-ID: <WlBdSGKAURXKFLdQuJn9-kcEf3HvMXC21j0PGW32tYE=.a97c9263-2236-4f7f-ac5b-ef0a98e0fccd@github.com> > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11017/files - new: https://git.openjdk.org/jdk/pull/11017/files/1cba0583..28439928 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Mon Nov 21 06:04:57 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 21 Nov 2022 06:04:57 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v2] In-Reply-To: <g17w1wVjzNMY24qTylC7Ymgfi1MzM-J0cV_HAlBuH2s=.cfe50eb7-f157-4493-97cc-f729f1fb6eda@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <g17w1wVjzNMY24qTylC7Ymgfi1MzM-J0cV_HAlBuH2s=.cfe50eb7-f157-4493-97cc-f729f1fb6eda@github.com> Message-ID: <DMhQAGoAmX9cVneCXhm_j0XnSHNPAejNn40rVPEB33E=.dd0dda69-62b6-40f5-b153-d113d4fe9b2d@github.com> On Mon, 21 Nov 2022 05:48:49 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add reentrant reentry limits Added some limitations on reentry of a reentrant step. It will now break the inner loop if: * It is the fourth time reentering this step * It is the eight time reentering any reentrant step * The stack headroom is less than 64K * A timeout has been issued The post loop logic of a reentrant step is given another timeout window. Currently all it does is make sure there are line breaks after the step output, but I imagine this can be useful incase some reentrant step logic is used where the loop builds up some data structure and the post logic prints it. All of the limit constants are just picked rather ad-hoc. Would be nice to have some extra feedback on this. ------------- PR: https://git.openjdk.org/jdk/pull/11017 From xuelei at openjdk.org Mon Nov 21 06:18:13 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 21 Nov 2022 06:18:13 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> Message-ID: <xX_LEhINRUIysQQQruq5udiX7HdAynfpTq_gllIVyaQ=.9475fdf8-27e6-414d-a992-9e20e761c5ca@github.com> On Sun, 13 Nov 2022 20:48:04 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. > > Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. > Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. > > I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden > (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around > to dealing with it. `::snprintf` in the list of candidates for > https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been > marked. But I don't see new bugs for the as-yet unmarked ones. > > As a general note, as a reviewer my preference is against non-trivial and > persnickety code changes that are scattered all over the code base. For > something like this I'd prefer multiple more bite-sized changes that were > dealing with specific uses. I doubt everyone agrees with me though. @kimbarrett, did you have further comments? I'm going to integrate this update this week. Please let me know if you need more time. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/11115 From jnimeh at openjdk.org Mon Nov 21 06:28:32 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 21 Nov 2022 06:28:32 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v4] In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: Pull out common macro code into function parameter pack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7702/files - new: https://git.openjdk.org/jdk/pull/7702/files/8d4b7ba7..0fd87c28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=02-03 Stats: 44 lines in 1 file changed: 23 ins; 17 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From iklam at openjdk.org Mon Nov 21 06:55:24 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 21 Nov 2022 06:55:24 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> Message-ID: <O6qz1pX6yNTe8awXHtS1pX_z5PYp40KZcgO-pBDStcg=.05245d89-c76d-4102-ba18-7a9a2a48001a@github.com> On Mon, 7 Nov 2022 21:19:50 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> @iklam thanks for sharing the information and details on the future work in this space. >> >>> By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail. >> >> I was actually referring to CA0 and CA1 in my figures (which I realized was not clear in my explanation earlier). >> Anyway, I now understand the existing mechanism works fine because the following conditions are maintained (which you have already mentioned in your comment): >> 1. G1 regions are at least 1MB, and are always a power of 2. >> 2. At dump time the objects are placed such that they do not cross `HeapRegion::min_region_size_in_words()` which I believe is 1M. >> >> Because of these two constraints, change in G1 region size at run time cannot result in objects crossing the region boundary. >> So if I update the G1 code such that at run time the regions are mapped at 1M boundary then I can get rid of the problem of objects crossing region boundary and the two tests also pass. >> >>> In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors. >> >> I agree ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) would make it easier to map them at run time and would be happy to contribute to it anyway possible. But again, that's a GC policy specific implementation detail. >> I guess you would agree we need to de-couple the CDS code from the GC policy details. While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. >> Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). >> So, with this in mind do you think we can continue working on this PR, or do you believe the GC APIs this PR proposes to add would not be sufficient once JDK-8296344 is implemented? >> >>> (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs). >> >> That makes sense. Ideally I should have done the implementation for other collectors in a separate RFEs. But I was worried if I the new APIs are flexible enough to support other non-G1 policies, and in an attempt to verify that I added the support for those policies as well. If it helps I can remove those commits and deliver them later in subsequent RFEs. > >> While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted). > > I think it depends on how we want to change the dump time operations. If we decide to go with a single contiguous block, then the API for mapping this block into the runtime heap will look very different than what you have today: > > > bool ArchiveHeapLoader::get_heap_range_for_archive_regions(ArchiveHeapRegions* heap_regions, bool is_open) { > if (Universe::heap()->alloc_archive_regions(heap_regions->dumptime_regions(), > heap_regions->num_regions(), > heap_regions->runtime_regions(), > is_open)) { > > > Also, we should probably record the region boundary information in the archived objects. Something like "objects never span across 1MB boundaries". This may need to be passed to the runtime mapping API, so incompatible collectors (i.e., one uses 512KB regions) can reject the archived objects. > > One of my goal for JDK-8296344 is to optimize the archived objects for the collector chosen at dump time. For example, if you dump with SerialGC, the archived objects can be mapped efficiently without relocation when SerialGC is also chosen at run time, but may require relocation if G1 is chosen at run time. I am not sure if how this would affect the runtime mapping API. Maybe some sort of preference would need to be indicated. > > I think it would be best for us to think about the whole picture before committing to a design. Timing wise, I think we missed the JDK 20 release anyway, so we should have plenty time to come up with a good design for JDK 21. > > I also would like to hear from folks in our GC team. @tschatzl @stefank > @iklam you mentioned [here](https://github.com/openjdk/jdk/pull/10970#issuecomment-1302600242) that you were not able to run the tests with -agentvm option using fastdebug build. Its a long shot but I am wondering if you still remember what error you were getting and which tests you were running. I lots my old build so I rebuilt from version [565e6ff](https://github.com/openjdk/jdk/pull/10970/commits/565e6ffd68d67a94a3ffff76734005492b98dd9d) and I couldn't reproduce the problem anymore. Maybe I had a glitch in my build. Sorry for the noise. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From dholmes at openjdk.org Mon Nov 21 07:03:17 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 21 Nov 2022 07:03:17 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking Message-ID: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. Testing: tiers 1-3 Thanks. ------------- Commit messages: - Forgot to commit removed test. - 8297106: Remove the -Xcheck:jni local reference capacity checking Changes: https://git.openjdk.org/jdk/pull/11259/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11259&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297106 Stats: 151 lines in 5 files changed: 0 ins; 150 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11259.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11259/head:pull/11259 PR: https://git.openjdk.org/jdk/pull/11259 From dholmes at openjdk.org Mon Nov 21 07:07:46 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 21 Nov 2022 07:07:46 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v2] In-Reply-To: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> Message-ID: <1W2AvoUKcm7wu4lvjQStwCNMO0HHKihSYxDp0N5nlWA=.24ad36e7-d82b-4edd-9423-da44c973faa4@github.com> > This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. > > Testing: tiers 1-3 > > Thanks. David Holmes has updated the pull request incrementally with two additional commits since the last revision: - Removed additional test that no longer applies. - Forgot to commit deleted test file. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11259/files - new: https://git.openjdk.org/jdk/pull/11259/files/73a0f2dd..3dd0ec0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11259&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11259&range=00-01 Stats: 134 lines in 2 files changed: 0 ins; 134 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11259.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11259/head:pull/11259 PR: https://git.openjdk.org/jdk/pull/11259 From shade at openjdk.org Mon Nov 21 07:25:30 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Nov 2022 07:25:30 GMT Subject: Integrated: 8294591: Fix cast-function-type warning in TemplateTable In-Reply-To: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> References: <F9kScSH8YZ9kK8XAcl5pPPE6sPrHCBUc_zNriSaP8EU=.d5c7452c-72ed-4f41-9dea-5bd8a7fccbc0@github.com> Message-ID: <y5mp4Q1G8zFSA5nztte6rnSnSnYQiBY9kXlwxrGd9Eo=.db3a9003-b858-4a5f-9e84-63b669789e64@github.com> On Thu, 29 Sep 2022 16:05:06 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: This pull request has now been integrated. Changeset: fc616588 Author: Aleksey Shipilev <shade at openjdk.org> URL: https://git.openjdk.org/jdk/commit/fc616588c1bf731150a9d9b80033bb589bcb231f Stats: 46 lines in 9 files changed: 6 ins; 1 del; 39 mod 8294591: Fix cast-function-type warning in TemplateTable Reviewed-by: ihse, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/10493 From chagedorn at openjdk.org Mon Nov 21 07:27:08 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Nov 2022 07:27:08 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v6] In-Reply-To: <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> Message-ID: <0dR70peukvwMgAwu0e7J9imsYk4jpobjMJb2EPr2PTA=.7b0c4c87-9c45-4fd9-a4b3-205159ee28b1@github.com> On Fri, 18 Nov 2022 16:06:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused local variable Thanks Thomas for your review and your feedback! Testing looked good. @tschatzl do you also agree with the new approach? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From dholmes at openjdk.org Mon Nov 21 07:35:15 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 21 Nov 2022 07:35:15 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 In-Reply-To: <xX_LEhINRUIysQQQruq5udiX7HdAynfpTq_gllIVyaQ=.9475fdf8-27e6-414d-a992-9e20e761c5ca@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <klNVgLaAprREVI2aALAP1V9p7KHz_B2pyUhoFBJgqvo=.6742030d-5184-44e6-9b03-0c59c2a8d8a6@github.com> <xX_LEhINRUIysQQQruq5udiX7HdAynfpTq_gllIVyaQ=.9475fdf8-27e6-414d-a992-9e20e761c5ca@github.com> Message-ID: <RF8vmCWNTsZVn5SAqR03YdbTh9xnQoNRSxPE4TlyhR8=.a677cfa4-c33e-4326-a3ee-95377632884a@github.com> On Mon, 21 Nov 2022 06:14:44 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Please don't add uses of `jio_snprintf` or `::snprintf` to hotspot. Use `os::snprintf`. >> >> Regarding `jio_snprintf`, see https://bugs.openjdk.org/browse/JDK-8198918. >> Regarding `os::snprintf` and `os::vsnprintf`, see https://bugs.openjdk.org/browse/JDK-8285506. >> >> I think the only reason we haven't marked `::sprintf` and `::snprintf` forbidden >> (FORBID_C_FUNCTION) is there are a lot of uses, and nobody has gotten around >> to dealing with it. `::snprintf` in the list of candidates for >> https://bugs.openjdk.org/browse/JDK-8214976, some of which have already been >> marked. But I don't see new bugs for the as-yet unmarked ones. >> >> As a general note, as a reviewer my preference is against non-trivial and >> persnickety code changes that are scattered all over the code base. For >> something like this I'd prefer multiple more bite-sized changes that were >> dealing with specific uses. I doubt everyone agrees with me though. > > @kimbarrett, did you have further comments? I'm going to integrate this update this week. Please let me know if you need more time. Thanks! @XueleiFan AFAICS only @tstuefe has reviewed the version where you check the return value, and as such you need a second reviewer for this nominal final version of the fix. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Mon Nov 21 07:49:06 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 21 Nov 2022 07:49:06 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v6] In-Reply-To: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> Message-ID: <tqs3sFTNYX7KXa0hzfcLmOgcFhtJsZfn8aJ2jHiJpO0=.4ab886e4-1f83-4236-9844-000862f1c7d7@github.com> > This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. > > To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. > > --- > > Patch > > - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. > - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. > - Removed a stray newline from print_native_stack to clean output. > - added regression testing for this feature. I removed my name from the test since we don't do this anymore. > - added clarifying comments to the test and code > - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) > > Output looks like this: > > > $ java ... -XX:+ErrorLogSecondaryErrorDetails > > > will produce, for secondary errors, siginfo and call stack. > > > [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] > [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] > [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) > V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) > V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) > V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) > V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) > V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) > C [libc.so.6+0x43090] > V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) > C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) > C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) > ] Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Tone down requirement for successful regression test - Fix bug where crashing would not reset recursion; add stack frame limit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11118/files - new: https://git.openjdk.org/jdk/pull/11118/files/984de277..29f97e7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11118&range=04-05 Stats: 40 lines in 4 files changed: 13 ins; 8 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/11118.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11118/head:pull/11118 PR: https://git.openjdk.org/jdk/pull/11118 From aboldtch at openjdk.org Mon Nov 21 08:11:19 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 21 Nov 2022 08:11:19 GMT Subject: RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v6] In-Reply-To: <tqs3sFTNYX7KXa0hzfcLmOgcFhtJsZfn8aJ2jHiJpO0=.4ab886e4-1f83-4236-9844-000862f1c7d7@github.com> References: <x1kul17oEaJ-UX6ZGPv8OcsxN8QfYlME_d39aSATk-Q=.f0e0169c-15f5-4b4b-8235-68f8e68ec43c@github.com> <tqs3sFTNYX7KXa0hzfcLmOgcFhtJsZfn8aJ2jHiJpO0=.4ab886e4-1f83-4236-9844-000862f1c7d7@github.com> Message-ID: <2KrGmH4F3rOZ-JZaR4kkqN6JNNAtr6uRrEKg1Y70pPk=.b9e17c24-b1a9-4b92-bb21-3d3697ed9629@github.com> On Mon, 21 Nov 2022 07:49:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017. >> >> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. >> >> --- >> >> Patch >> >> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`. >> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead. >> - Removed a stray newline from print_native_stack to clean output. >> - added regression testing for this feature. I removed my name from the test since we don't do this anymore. >> - added clarifying comments to the test and code >> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895) >> >> Output looks like this: >> >> >> $ java ... -XX:+ErrorLogSecondaryErrorDetails >> >> >> will produce, for secondary errors, siginfo and call stack. >> >> >> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61] >> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000] >> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946) >> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564) >> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709) >> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467) >> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473) >> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656) >> C [libc.so.6+0x43090] >> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662) >> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457) >> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650) >> ] > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Tone down requirement for successful regression test > - Fix bug where crashing would not reset recursion; add stack frame limit lgtm. src/hotspot/share/utilities/vmError.cpp line 355: > 353: if (fr.pc()) { > 354: st->print_cr("Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)"); > 355: const int limit = max_frames == -1 ? StackPrintLimit : max_frames; Only thing is that we may now print more than StackPrintLimit. Maybe clamping max_frames is more appropriate. But I do not see a reason why someone would lower StackPrintLimit and use ErrorLogSecondaryErrorDetails. ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/11118 From tschatzl at openjdk.org Mon Nov 21 08:43:41 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 21 Nov 2022 08:43:41 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v6] In-Reply-To: <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <Npv8aIJVyGBNXUAtEStPlQfbUUaOsGjxulLp209l2bQ=.35ab9699-da60-4171-b04f-469c8d5f793a@github.com> Message-ID: <JepS4RFI09FbVwfY2l10r2AoNt_AChjtka0z8s85N2U=.bd5814f8-78f4-4a66-b289-e16db4715211@github.com> On Fri, 18 Nov 2022 16:06:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused local variable Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10287 From ngasson at openjdk.org Mon Nov 21 09:21:26 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 21 Nov 2022 09:21:26 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v4] In-Reply-To: <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> Message-ID: <ysGrVsVrKfy29pKxNB9rSWZ72WCXHVJkUL4MMUS6pl4=.9a49d61a-7e59-4017-a890-0c300470396d@github.com> On Mon, 21 Nov 2022 06:28:32 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > Pull out common macro code into function parameter pack Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/7702 From ngasson at openjdk.org Mon Nov 21 09:21:29 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 21 Nov 2022 09:21:29 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v3] In-Reply-To: <emKiLeQ71GF0MnhnB12gWSYRIg7ZSe8Efl0tnxPv300=.2c03f75a-2e3f-4708-a864-557119222cfa@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <L1ZTSckdP_iY9bvifRX-00Qqw6VLq3sWU9k93TrLW_w=.fc29ffb4-78dc-4d31-b36c-c296e3f48a91@github.com> <emKiLeQ71GF0MnhnB12gWSYRIg7ZSe8Efl0tnxPv300=.2c03f75a-2e3f-4708-a864-557119222cfa@github.com> Message-ID: <pvu4K9OI9qAaL5tZG_IYJQ7puDgafOS7wEp5R8MSZJI=.1f658db1-9d77-4e12-9738-1ecc55929025@github.com> On Thu, 17 Nov 2022 18:50:48 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: >> >> replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations > > Another pair of arm-knowledgeable eyes on this is always welcome! AArch64 code looks OK to me, and I believe @jnimeh already discussed the implementation with one of my colleagues who works on crypto optimisation at Arm. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From eosterlund at openjdk.org Mon Nov 21 09:58:01 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 21 Nov 2022 09:58:01 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing Message-ID: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. ------------- Commit messages: - Remove trailing whitespace - 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing Changes: https://git.openjdk.org/jdk/pull/11238/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11238&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294924 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11238.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11238/head:pull/11238 PR: https://git.openjdk.org/jdk/pull/11238 From rkennke at openjdk.org Mon Nov 21 11:07:22 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:22 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity Message-ID: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. Testing: - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) - [x] tier1 (x86_64, x86_32, aarch64, riscv) - [x] tier2 (x86_64, aarch64, riscv) - [x] tier3 (x86_64, riscv) ------------- Commit messages: - More PPC fixes - Merge branch 'master' into JDK-8139457 - Revert BytesPerWord/BytesPerInt change in RISCV - More RISCV fixes - Add test to verify array base offset - RISCV parts - PPC parts - s390 parts - Arm parts - Aarch64 parts - ... and 1 more: https://git.openjdk.org/jdk/compare/e81359f1...42caf4bd Changes: https://git.openjdk.org/jdk/pull/11044/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8139457 Stats: 246 lines in 26 files changed: 142 ins; 43 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From shade at openjdk.org Mon Nov 21 11:07:22 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Nov 2022 11:07:22 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> Message-ID: <xL3p1VBbo9_9o2--RChNo0OMO3PV3KEfNLCeCrBHwgM=.20ec1ff5-b91a-4690-960b-59556bc57a50@github.com> On Tue, 8 Nov 2022 20:18:09 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) ARM32 seems to build well, passes `runtime/FieldLayout` tests, and bootcycles. RISC-V needs more work: $ make images test TEST=runtime/FieldLayout # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (0xe0000000), pid=454832, tid=454835 # stop: len is not a multiple of BytesPerWord # # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64) # Problematic frame: # J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base at 20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c] ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:23 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:23 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <xL3p1VBbo9_9o2--RChNo0OMO3PV3KEfNLCeCrBHwgM=.20ec1ff5-b91a-4690-960b-59556bc57a50@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <xL3p1VBbo9_9o2--RChNo0OMO3PV3KEfNLCeCrBHwgM=.20ec1ff5-b91a-4690-960b-59556bc57a50@github.com> Message-ID: <CkdHPqNdmDL5vCICSWTHO1MjP4_DwMPXAT90lX3Vzww=.b1981b5b-91b3-465d-b3ef-868a8a16fbe7@github.com> On Thu, 10 Nov 2022 18:38:09 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > RISC-V needs more work: > > ``` > $ make images test TEST=runtime/FieldLayout > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (0xe0000000), pid=454832, tid=454835 > # stop: len is not a multiple of BytesPerWord > # > # JRE version: OpenJDK Runtime Environment (20.0) (fastdebug build 20-internal-adhoc.shade.shipilev-jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-internal-adhoc.shade.shipilev-jdk, mixed mode, tiered, compressed oops, g1 gc, linux-riscv64) > # Problematic frame: > # J 78 c1 java.util.Arrays.copyOfRange([BII)[B java.base at 20-internal (64 bytes) @ 0x0000003f84c8bf2c [0x0000003f84c8bdc0+0x000000000000016c] > ``` Thanks for trying this. It may be enough to change the assert to check for BytesPerInt multiple instead. Something like the following: ` diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index 91833b662e2..107e4cfcedd 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -4215,9 +4215,9 @@ void MacroAssembler::zero_memory(Register addr, Register len, Register tmp) { #ifdef ASSERT { Label L; - andi(t0, len, BytesPerWord - 1); + andi(t0, len, BytesPerInt - 1); beqz(t0, L); - stop("len is not a multiple of BytesPerWord"); + stop("len is not a multiple of BytesPerInt"); bind(L); } #endif // ASSERT ` ------------- PR: https://git.openjdk.org/jdk/pull/11044 From fyang at openjdk.org Mon Nov 21 11:07:23 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 21 Nov 2022 11:07:23 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> Message-ID: <phdXKRUyQloaow8TxB0-vORpM_ihahr9MFj_bKZLy80=.9a1fe6ef-2f9b-4237-9186-a2a113a69631@github.com> On Tue, 8 Nov 2022 20:18:09 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv. diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp index 5989d5ab809..9dced7c53e9 100644 --- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp @@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes); beqz(len_in_bytes, done); + // Zero first 4 bytes, if start offset is not word aligned. + if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) { + sw(zr, Address(obj, hdr_size_in_bytes)); + sub(len_in_bytes, len_in_bytes, BytesPerInt); + hdr_size_in_bytes += BytesPerInt; + } + // Preserve obj if (hdr_size_in_bytes) { add(obj, obj, hdr_size_in_bytes); ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:24 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <phdXKRUyQloaow8TxB0-vORpM_ihahr9MFj_bKZLy80=.9a1fe6ef-2f9b-4237-9186-a2a113a69631@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <phdXKRUyQloaow8TxB0-vORpM_ihahr9MFj_bKZLy80=.9a1fe6ef-2f9b-4237-9186-a2a113a69631@github.com> Message-ID: <qryFl_aH4wBPGO18vAGUTMsrNFS-tTn0zA-G4d6tH94=.5429be39-6afb-4f32-813a-06a33920593e@github.com> On Fri, 11 Nov 2022 08:17:19 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv. > > ``` > diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > index 5989d5ab809..9dced7c53e9 100644 > --- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > +++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > @@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int > sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes); > beqz(len_in_bytes, done); > > + // Zero first 4 bytes, if start offset is not word aligned. > + if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) { > + sw(zr, Address(obj, hdr_size_in_bytes)); > + sub(len_in_bytes, len_in_bytes, BytesPerInt); > + hdr_size_in_bytes += BytesPerInt; > + } > + > // Preserve obj > if (hdr_size_in_bytes) { > add(obj, obj, hdr_size_in_bytes); > ``` Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From fyang at openjdk.org Mon Nov 21 11:07:24 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 21 Nov 2022 11:07:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <qryFl_aH4wBPGO18vAGUTMsrNFS-tTn0zA-G4d6tH94=.5429be39-6afb-4f32-813a-06a33920593e@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <phdXKRUyQloaow8TxB0-vORpM_ihahr9MFj_bKZLy80=.9a1fe6ef-2f9b-4237-9186-a2a113a69631@github.com> <qryFl_aH4wBPGO18vAGUTMsrNFS-tTn0zA-G4d6tH94=.5429be39-6afb-4f32-813a-06a33920593e@github.com> Message-ID: <StZT6oPHqLugNDSvlbLJw3_W9FhjjJDP0n6lNmZJXSg=.50c0e67f-f789-455e-b891-9cb0690fcd27@github.com> On Fri, 11 Nov 2022 10:08:08 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > > Hi, you might need one extra change for riscv in order to pass this test: ./test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java But I haven't perform full test for all these changes on riscv. > > ``` > > diff --git a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > > index 5989d5ab809..9dced7c53e9 100644 > > --- a/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > > +++ b/src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp > > @@ -177,6 +177,13 @@ void C1_MacroAssembler::initialize_body(Register obj, Register len_in_bytes, int > > sub(len_in_bytes, len_in_bytes, hdr_size_in_bytes); > > beqz(len_in_bytes, done); > > > > + // Zero first 4 bytes, if start offset is not word aligned. > > + if (!is_aligned(hdr_size_in_bytes, BytesPerWord)) { > > + sw(zr, Address(obj, hdr_size_in_bytes)); > > + sub(len_in_bytes, len_in_bytes, BytesPerInt); > > + hdr_size_in_bytes += BytesPerInt; > > + } > > + > > // Preserve obj > > if (hdr_size_in_bytes) { > > add(obj, obj, hdr_size_in_bytes); > > ``` > > Thanks for checking and providing the fix, Fei! I pushed those changes and updated the test matrix accordingly. With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp #ifdef ASSERT { Label L; - andi(t0, len, BytesPerWord - 1); + andi(t0, len, BytesPerInt - 1); beqz(t0, L); - stop("len is not a multiple of BytesPerWord"); + stop("len is not a multiple of BytesPerInt"); bind(L); } #endif // ASSERT Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:24 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> Message-ID: <RFvJgF50t3dA4CZO-u_OZvSiN4OilgSvKmZPJCmBW58=.bc6abd6b-ec90-447e-95ab-454675f762c0@github.com> On Tue, 8 Nov 2022 20:18:09 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) > With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > > Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform. Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix. Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/11044 From fyang at openjdk.org Mon Nov 21 11:07:24 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 21 Nov 2022 11:07:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <RFvJgF50t3dA4CZO-u_OZvSiN4OilgSvKmZPJCmBW58=.bc6abd6b-ec90-447e-95ab-454675f762c0@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <RFvJgF50t3dA4CZO-u_OZvSiN4OilgSvKmZPJCmBW58=.bc6abd6b-ec90-447e-95ab-454675f762c0@github.com> Message-ID: <5YN4LhQP0m470tyb78pqxmAxVurHGcNiFrEzGQQY9BM=.e145b7b4-db2a-4cbd-b7c9-07fd61d57120@github.com> On Thu, 17 Nov 2022 08:30:45 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > > With my proposed fix, I don't think you need the following change made in file: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > > Could you please remove this change from this PR? I am running some tests on my linux-riscv64 platform. > > Ok, I reverted that part. Could you test that? Also, if you're running any of the tests menioned in the PR, can you let me know and I'll update the test matrix. > > Thanks, Roman Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:24 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:24 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <5YN4LhQP0m470tyb78pqxmAxVurHGcNiFrEzGQQY9BM=.e145b7b4-db2a-4cbd-b7c9-07fd61d57120@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <RFvJgF50t3dA4CZO-u_OZvSiN4OilgSvKmZPJCmBW58=.bc6abd6b-ec90-447e-95ab-454675f762c0@github.com> <5YN4LhQP0m470tyb78pqxmAxVurHGcNiFrEzGQQY9BM=.e145b7b4-db2a-4cbd-b7c9-07fd61d57120@github.com> Message-ID: <0KqlWWM5b7GEpVrvXdMPadtXLSyFnqclSUhKaUtzpwQ=.7fe815f5-498b-45fb-b318-e08cfc445ff9@github.com> On Thu, 17 Nov 2022 08:59:45 GMT, Fei Yang <fyang at openjdk.org> wrote: > Hi, Thanks for the update. This has passed tier1-3 tests on my linux-riscv64 hifive unmatched boards. Thanks, Fei! This is very appreciated! ------------- PR: https://git.openjdk.org/jdk/pull/11044 From stuefe at openjdk.org Mon Nov 21 11:07:25 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 21 Nov 2022 11:07:25 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> Message-ID: <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> On Tue, 8 Nov 2022 20:18:09 GMT, Roman Kennke <rkennke at openjdk.org> wrote: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) This should make it work on ppc. thomas at starfish:/shared/projects/openjdk/jdk-jdk/source$ git diff diff --git a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp index 87b87e83e1a..4420c2ac4ca 100644 --- a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp @@ -361,6 +361,16 @@ void C1_MacroAssembler::allocate_array( const Register index = t3; addi(base, obj, base_offset_in_bytes); // compute address of first element addi(index, arr_size, -(base_offset_in_bytes)); // compute index = number of bytes to clear + + // Elements are not dword aligned. Zero out leading word. + if (!is_aligned(base_offset_in_bytes, BytesPerWord)) { + assert(is_aligned(base_offset_in_bytes, BytesPerInt), "weird alignment"); + li(t1, 0); + stw(t1, 0, base); + addi(base, base, BytesPerInt); + // Note: initialize_body will align index down, no need to correct it here. + } + initialize_body(base, index); if (CURRENT_ENV->dtrace_alloc_probes()) { I did the zero-ing out up in `allocate_array` since I did not want to affect the object allocation path. I ran several tests manually with and without UseCCP. Your test case runs also through. Our hardware is a bottleneck though, and currently Richard is using our test queue with his PPC Loom port. Therefore it may take a while until I manage to run more tests. I tested s390, and it seems to work without a change. Did multiple tests with -UseCCP, as well as your test case. I think it works out of the box since C1_MacroAssembler::initialize_body() uses MVCLE to zero out memory, and that instruction works at the byte level, so no alignment restrictions for input pointers. src/hotspot/share/oops/arrayOop.hpp line 77: > 75: return !UseCompressedOops; > 76: } > 77: #endif I'm confused why this is not needed today? ------------- PR: https://git.openjdk.org/jdk/pull/11044 From stuefe at openjdk.org Mon Nov 21 11:07:25 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 21 Nov 2022 11:07:25 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> Message-ID: <x_Brg-pZhUeueUAmj3nLZIrWbKkuShOJfuCtXU_WBV4=.3770bb29-04b3-408c-aeb0-53c4833f113a@github.com> On Thu, 17 Nov 2022 12:54:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > I did the zero-ing out up in `allocate_array` since I did not want to affect the object allocation path. About that, I see that other platforms zero out the leading bytes in `initialize_body`, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers? ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:25 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:25 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <x_Brg-pZhUeueUAmj3nLZIrWbKkuShOJfuCtXU_WBV4=.3770bb29-04b3-408c-aeb0-53c4833f113a@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> <x_Brg-pZhUeueUAmj3nLZIrWbKkuShOJfuCtXU_WBV4=.3770bb29-04b3-408c-aeb0-53c4833f113a@github.com> Message-ID: <yPPc3a8Nzp1B7eZ6DxqcAdE8hW6tdK36qXVmoK-n_Fo=.cc386eca-a4fb-4d9b-97a8-69b33381f027@github.com> On Thu, 17 Nov 2022 12:56:04 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > > I did the zero-ing out up in `allocate_array` since I did not want to affect the object allocation path. > > About that, I see that other platforms zero out the leading bytes in `initialize_body`, but does that not mean that we now do a pointless store whenever we initialize a variable-sized object in +UseCCP mode with 12byte headers? I don't think so. initialize_object() always passes an aligned offset to initialize_body() because that gap at offset 12 is handled by initialize_header() already. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon Nov 21 11:07:25 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Nov 2022 11:07:25 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> References: <vq1DJY-YpIUdsfGAw0ibRlCe84GxroK6y0z2MIVAjb4=.b45385cc-7d1b-4789-8344-2383911705ff@github.com> <JfL093FbUTTOwic9KHMAMat13oqJaTRf-4Le98D8kdc=.b49fe9be-4cf6-4d29-87ee-6a945ef5c2c0@github.com> Message-ID: <FzdMh18Aweb4zLOylEMBtqYPJOz2cu19mNUos6ipt_g=.95b9f9cc-970f-4a09-8e94-fbc2d33d2424@github.com> On Thu, 17 Nov 2022 13:37:56 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > src/hotspot/share/oops/arrayOop.hpp line 77: > >> 75: return !UseCompressedOops; >> 76: } >> 77: #endif > > I'm confused why this is not needed today? > Today this is only used by typeArrayOops, afaict. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From pchilanomate at openjdk.org Mon Nov 21 12:20:10 2022 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 21 Nov 2022 12:20:10 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> Message-ID: <nli1ZOZrAQ_vNgtG9Rvi516a99ClHUIblH8LCoNj9ag=.a67626f1-5cac-412c-b8c1-1aba29629348@github.com> On Thu, 17 Nov 2022 09:24:04 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Fix Richard comments I went through the changes and all looks good to me. Only minor comments. Thanks, Patricio src/hotspot/share/gc/shared/memAllocator.cpp line 381: > 379: } > 380: > 381: oop MemAllocator::try_allocate_in_existing_tlab() { try_allocate_in_existing_tlab() is now unused in memAllocator.hpp. src/hotspot/share/gc/shared/memAllocator.hpp line 98: > 96: virtual oop initialize(HeapWord* mem) const; > 97: > 98: using MemAllocator::allocate; Do we need these declarations? I thought this would be needed if allocate() would not be public on the base class or to avoid hiding it if here we define a method with the same name but different signature. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1393: > 1391: // Guaranteed to be in young gen / newly allocated memory > 1392: assert(!chunk->requires_barriers(), "Unfamiliar GC requires barriers on TLAB allocation"); > 1393: _barriers = false; Do we need to explicitly set _barriers to false? It's already initialized to be false (same above for the UseZGC case). That would also allow to simplify the code a bit I think to be just an if statement that calls requires_barriers() for the "ZGC_ONLY(!UseZGC &&) (SHENANDOAHGC_ONLY(UseShenandoahGC ||) allocator.took_slow_path())" case, and then ZGC and the fast path could use just separate asserts outside conditionals. ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.org/jdk/pull/11111 From chagedorn at openjdk.org Mon Nov 21 12:56:19 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Nov 2022 12:56:19 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v7] In-Reply-To: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> Message-ID: <8uo2nH7Ans2cQz5QGDNsgPNirtkBKJP8pnrrbr474qw=.990e13c3-96a8-49c1-92c0-912ec0e834a1@github.com> > The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: > > The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. > > Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. > > I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into JDK-8293422 - Remove unused local variable - Update algorithm to print char by char, skipping file separators on the fly and only caring about the actual filename (ignore prefix path when reading) - Merge branch 'master' into JDK-8293422 - Always read full filename and strip prefix path and only then cut filename to fit output buffer - Merge branch 'master' into JDK-8293422 - Merge branch 'master' into JDK-8293422 - Review comments from Thomas - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ - 8293422: DWARF emitted by Clang cannot be parsed ------------- Changes: https://git.openjdk.org/jdk/pull/10287/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=06 Stats: 162 lines in 5 files changed: 113 ins; 31 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/10287.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10287/head:pull/10287 PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Mon Nov 21 13:02:27 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Nov 2022 13:02:27 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed In-Reply-To: <S8XIuc6_dJT8EHDa8qbhioeDpU_V1HtvbThkHTkG9x0=.f4cf594d-d6ae-4b1d-a8fb-196e14384362@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> <F6wh_mBr34PgGPZ8EqRGW5QQGUCpSIz6biESoaLylpU=.ab95056f-5351-44b4-8607-d19660ef4118@github.com> <S8XIuc6_dJT8EHDa8qbhioeDpU_V1HtvbThkHTkG9x0=.f4cf594d-d6ae-4b1d-a8fb-196e14384362@github.com> Message-ID: <_IadcH6MBdD1h_UnghBA2qvutX9OiBDl3l39yzOLtZE=.3b84e177-e17d-4040-a592-9fd4562122dc@github.com> On Wed, 21 Sep 2022 10:47:38 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: >> Thanks Thomas for that link. I was not aware of this `-gdwarf-aranges` flag. I've tried it out and it indeed seems to work. But I was not able to build with `-flto=thin` as it resulted in build failures. So, I'm not sure what would happen and if it's even possible to build with it in general. Nevertheless, I suggest to go with that `-gdwarf-aranges` flag solution for now and remove the previously suggested bailout fix for Clang. >> >> However, `-gdwarf-aranges` (and `-gdwarf-4` which I think we should also add to avoid getting the unsupported DWARF 5 format) was only added in Clang 5.0. But we must support down to 3.5 according to: >> https://github.com/openjdk/jdk/blob/cb72f80925965c73e32c44ce3196866272306d7f/doc/building.md?plain=1#L353-L354 >> >> I therefore changed the previous complete bailout fix to a bailout fix for Clang versions older than 5.0. >> >> I've noticed that Clang is emitting a full relative path for the filename in the form of `src/hotspot/share/compiler/compilerThread.cpp:58` with debug builds (it only emits the filename with release builds). I therefore added an additional method `strip_path_prefix()` to get rid of the path prefix. > >> Thanks Thomas for that link. I was not aware of this `-gdwarf-aranges` flag. I've tried it out and it indeed seems to work. But I was not able to build with `-flto=thin` as it resulted in build failures. > > I only thought that maybe we do compile with `-flto=thin` already, and so we could not use these flags. > >>So, I'm not sure what would happen and if it's even possible to build with it in general. Nevertheless, I suggest to go with that `-gdwarf-aranges` flag solution for now and remove the previously suggested bailout fix for Clang. > > I agree. > >> >> However, `-gdwarf-aranges` (and `-gdwarf-4` which I think we should also add to avoid getting the unsupported DWARF 5 format) was only added in Clang 5.0. But we must support down to 3.5 according to: >> >> https://github.com/openjdk/jdk/blob/cb72f80925965c73e32c44ce3196866272306d7f/doc/building.md?plain=1#L353-L354 >> >> I therefore changed the previous complete bailout fix to a bailout fix for Clang versions older than 5.0. >> >> I've noticed that Clang is emitting a full relative path for the filename in the form of `src/hotspot/share/compiler/compilerThread.cpp:58` with debug builds (it only emits the filename with release builds). I therefore added an additional method `strip_path_prefix()` to get rid of the path prefix. > > Okay. Thanks @tschatzl for reviewing it again! ------------- PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Mon Nov 21 13:02:27 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Nov 2022 13:02:27 GMT Subject: Integrated: 8293422: DWARF emitted by Clang cannot be parsed In-Reply-To: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> References: <Cuo-hZ2dmp5Su2aMvYe3k_w_rEuzCk7F8wpeCsQuyMA=.911a0412-42c0-44a1-8863-e94e5a7970e7@github.com> Message-ID: <LAsyhHmTeZEwyNSvd33HwQcf3VBBV884lfLDuk2xJGY=.612b7abe-7c6b-4cfd-89df-0f9d4a4a9eb7@github.com> On Thu, 15 Sep 2022 11:59:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote: > The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: > > The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. > > Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. > > I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. > > Thanks, > Christian This pull request has now been integrated. Changeset: 8b8d8481 Author: Christian Hagedorn <chagedorn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8b8d8481bc05eec70a1df832668322e5c17694d8 Stats: 162 lines in 5 files changed: 113 ins; 31 del; 18 mod 8293422: DWARF emitted by Clang cannot be parsed Reviewed-by: tschatzl, ihse, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/10287 From coleenp at openjdk.org Mon Nov 21 14:17:35 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 21 Nov 2022 14:17:35 GMT Subject: RFR: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call [v3] In-Reply-To: <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> <aLRqrnYT43MVZKrB_-rH6-KnkiCbSW6yhnplZK1sjOQ=.46d86a19-cc5e-4781-90ee-8d16bd14d6c8@github.com> Message-ID: <jcK_n8OR_PiiC0dxscm1v9UBJhq9MwSiwtPclLjkdCQ=.2108fa41-b5da-4b34-9c2a-49a3a82c0624@github.com> On Mon, 7 Nov 2022 20:40:33 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: >> This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. >> Tested with tier1-4, and jvmti and jdi tests locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > really revert the file Thanks David and Alan for the code review and all the help with the CSR and release notes. ------------- PR: https://git.openjdk.org/jdk/pull/11023 From coleenp at openjdk.org Mon Nov 21 14:19:30 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 21 Nov 2022 14:19:30 GMT Subject: Integrated: 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call In-Reply-To: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> References: <WVryDic7CmF2lqt8iA23a1JEY2LXqP8LPP_m0VdrrIU=.188d87ee-f64a-40f7-b83c-a0ef343272aa@github.com> Message-ID: <1oVXMFWnlx-cv4MyUKu8M0mozrUReohL4Tg_HjcESUw=.81b0a9da-ed02-45a2-ae97-3380bc55474e@github.com> On Mon, 7 Nov 2022 17:07:01 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote: > This patch moves the acquisition of the boot class loader lock out of the JVM and into the Java function. > Tested with tier1-4, and jvmti and jdi tests locally. This pull request has now been integrated. Changeset: 5c334540 Author: Coleen Phillimore <coleenp at openjdk.org> URL: https://git.openjdk.org/jdk/commit/5c3345404d850cf01d9629b48015f1783a32bfc0 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod 8296472: Remove ObjectLocker around appendToClassPathForInstrumentation call Reviewed-by: sspitsyn, alanb, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/11023 From mariellh at spotify.com Mon Nov 21 14:21:01 2022 From: mariellh at spotify.com (Mariell Hoversholm) Date: Mon, 21 Nov 2022 15:21:01 +0100 Subject: Adding -XX option for overwriting HeapDumpPath In-Reply-To: <CAAZhyNNqk4R1yvWGViWWfOwM1=bK6k4R7sPxYu4w3js7Zeo9PA@mail.gmail.com> References: <CAAZhyNNqk4R1yvWGViWWfOwM1=bK6k4R7sPxYu4w3js7Zeo9PA@mail.gmail.com> Message-ID: <CAAZhyNNLLLX=7pJACguiMZwKJV0W32MQQt-ZDFcYW0ANBTOmwg@mail.gmail.com> Hi, Would it be possible to add a new option for overwriting the file provided by the `-XX:HeapDumpPath`[1] option? Our use-case entails ensuring the created path is known beforehand to running the JVM itself, and this is ensured by using `mktemp`[2]. In short, because of how `mktemp` works, it will create an empty file to mark it used; this fact is used to ensure there are no name collisions, as it will simply re-generate a file name until it finds a non-existent file. We would like to preserve this ability (meaning we cannot use the "unsafe" option `--dry-run`). Feel free to move me to another mailing list if I chose the wrong one. I figure it was either this one or hotspot-runtime-dev at . Cheers, Mariell Hoversholm (she/they) [1]: https://github.com/openjdk/jdk17u/blob/d5fedc5b5fdfaa852894b6374873012645576f15/src/hotspot/share/runtime/globals.hpp#L537-L540 [2]: https://manpages.debian.org/buster/coreutils/mktemp.1.en.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221121/e2e78c0e/attachment.htm> From alanb at openjdk.org Mon Nov 21 14:24:40 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 21 Nov 2022 14:24:40 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> Message-ID: <t3qA0q8LZQRcbUbd-BZuJDTfdyRcrQ44W_n14Up7N4w=.1d5a8119-0eb5-4ec2-a86b-3ef5f0cd283c@github.com> On Sun, 20 Nov 2022 08:48:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed a trailing white space issue Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11246 From alanb at openjdk.org Mon Nov 21 14:24:40 2022 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 21 Nov 2022 14:24:40 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <dC6q9bNvUqayvKjGi2MJk2QrtuuH5CkprslFHVDmq3E=.bb3ff667-5c6d-4251-9de3-7e27bfdc64b2@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> <dC6q9bNvUqayvKjGi2MJk2QrtuuH5CkprslFHVDmq3E=.bb3ff667-5c6d-4251-9de3-7e27bfdc64b2@github.com> Message-ID: <uZvaWGIY0Asr_a_nTZAzt7EKVskDtTg7AyLaYC-P4a0=.ae13b513-be1f-4dba-8fb5-7d0187588f92@github.com> On Sun, 20 Nov 2022 08:53:14 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > I've pushed an update to extend the test to cover more configurations: > > * agent loaded at startup and into running VM > * with and without enabling JVMTI `can_support_virtual_classes` capability Good, that makes for a much more complete test. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From shade at redhat.com Mon Nov 21 14:39:34 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Nov 2022 15:39:34 +0100 Subject: Adding -XX option for overwriting HeapDumpPath In-Reply-To: <CAAZhyNNLLLX=7pJACguiMZwKJV0W32MQQt-ZDFcYW0ANBTOmwg@mail.gmail.com> References: <CAAZhyNNqk4R1yvWGViWWfOwM1=bK6k4R7sPxYu4w3js7Zeo9PA@mail.gmail.com> <CAAZhyNNLLLX=7pJACguiMZwKJV0W32MQQt-ZDFcYW0ANBTOmwg@mail.gmail.com> Message-ID: <bcf7d17c-c4e0-65e7-465e-b7b2d354c163@redhat.com> On 11/21/22 15:21, Mariell Hoversholm wrote: > Would it be possible to add a new option for overwriting the file provided by the > `-XX:HeapDumpPath`[1] option? It is technically simple to do: define a flag in globals.hpp and pass it in HeapDumper::dump_heap to HeapDumper::dump, which already takes "overwrite" argument. But the larger question is if this warrants the extension of (product) flag set. Once we add the flag, it adds up to maintenance costs, and would require multiple JDK releases to get rid of, once unused. > Our use-case entails ensuring the created path is known beforehand to running the JVM itself, and > this is ensured by using `mktemp`[2].? In short, because of how `mktemp` works, it will create an > empty file to mark it used; this fact is used to ensure there are no name collisions, as it will > simply re-generate a file name until it finds a non-existent file.? We would like to preserve this > ability (meaning we cannot use the "unsafe" option `--dry-run`). Is your use case specifically about HeapDumpOnOutOfMemoryError? Because I think if you request heapdump through jcmd, then you would get `-overwrite` option today: $ jcmd 2383127 GC.heap_dump idea.heap 2383127: Dumping heap to idea.heap ... Heap dump file created [1810669511 bytes in 14.977 secs] $ jcmd 2383127 GC.heap_dump idea.heap 2383127: Dumping heap to idea.heap ... Unable to create idea.heap: File exists $ jcmd 2383127 GC.heap_dump idea.heap -overwrite 2383127: Dumping heap to idea.heap ... Heap dump file created [1806642566 bytes in 14.780 secs] -- Thanks, -Aleksey From mariellh at spotify.com Mon Nov 21 14:50:57 2022 From: mariellh at spotify.com (Mariell Hoversholm) Date: Mon, 21 Nov 2022 15:50:57 +0100 Subject: Adding -XX option for overwriting HeapDumpPath In-Reply-To: <bcf7d17c-c4e0-65e7-465e-b7b2d354c163@redhat.com> References: <CAAZhyNNqk4R1yvWGViWWfOwM1=bK6k4R7sPxYu4w3js7Zeo9PA@mail.gmail.com> <CAAZhyNNLLLX=7pJACguiMZwKJV0W32MQQt-ZDFcYW0ANBTOmwg@mail.gmail.com> <bcf7d17c-c4e0-65e7-465e-b7b2d354c163@redhat.com> Message-ID: <CAAZhyNNeZEp6Zxnk4cwsdwSD4Cf3hsMW7WZKBmwCWEW6df75sg@mail.gmail.com> On Mon, Nov 21, 2022 at 3:39 PM Aleksey Shipilev wrote: > It is technically simple to do: define a flag in globals.hpp and pass it in HeapDumper::dump_heap to > HeapDumper::dump, which already takes "overwrite" argument. But the larger question is if this > warrants the extension of (product) flag set. Once we add the flag, it adds up to maintenance costs, > and would require multiple JDK releases to get rid of, once unused. I do not personally care at what level ("extension"?) it is added; I would think something like the `product` or `diagnostic` level would be best, assuming there are some variant of those that are only maintained for the current major version. > Is your use case specifically about HeapDumpOnOutOfMemoryError? Yes, it is. Sorry, I should have added that. To be specific, we use `-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="$(mktemp ...)" -XX:+CrashOnOutOfMemoryError`. For now, there is a workaround of using `mktemp --dry-run`, but this we would like to avoid, but it is not a huge problem. Cheers, Mariell -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221121/1adf1585/attachment.htm> From rrich at openjdk.org Mon Nov 21 16:15:34 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 21 Nov 2022 16:15:34 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> Message-ID: <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> On Sun, 20 Nov 2022 08:48:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > fixed a trailing white space issue Hi Serguei, besides minor comments the changes look fine to me. Best regards, Richard. src/hotspot/share/prims/jvmtiExport.cpp line 201: > 199: JvmtiVirtualThreadEventMark(JavaThread *thread) : > 200: JvmtiEventMark(thread) { > 201: if (thread->vthread() != NULL) { Can this condition ever be false? test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualThreadStartTest/libVirtualThreadStartTest.cpp line 54: > 52: started_thread_cnt++; > 53: } > 54: deallocate(jvmti, jni, (void*)tname); This will crash if `get_thread_name()` returns the string constant `"<Unnamed thread>"` ------------- PR: https://git.openjdk.org/jdk/pull/11246 From kevinw at openjdk.org Mon Nov 21 17:37:26 2022 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 21 Nov 2022 17:37:26 GMT Subject: RFR: 8296265: Use modern HTML in the JVMTI spec In-Reply-To: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> References: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Message-ID: <kwFEjGylYZgmlDZw6IpDL6wlvvoyhTJHsysCDxipQcM=.a68be7f9-40b1-4cdf-b99f-567296f79896@github.com> On Fri, 11 Nov 2022 00:43:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Changes: > - removed `<b>` from TOC; > - added CSS style for TOC (to simplify customization, currently it's empty); > - removed `<b>` from from function list (per Phase); > - removed `<b>` from from list of events; > - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; > - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); > - dropped duplicate `"b"` transform. Looks ok to me. Yes, thanks for attaching the example files! ------------- Marked as reviewed by kevinw (Committer). PR: https://git.openjdk.org/jdk/pull/11099 From duke at openjdk.org Mon Nov 21 17:44:36 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Nov 2022 17:44:36 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> Message-ID: <EseUb0cgdeigmu9nNWflTuRMLKo6T0nEruj3TaPqfYQ=.bedc9bf5-5f1f-48be-8846-0064a88036a0@github.com> On Thu, 17 Nov 2022 19:32:28 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > Overall, looks good. Just one minor cleanup suggestion. > > I've submitted the latest patch for testing (hs-tier1 - hs-tier4). @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the latest patch again?) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From pchilanomate at openjdk.org Mon Nov 21 17:54:08 2022 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 21 Nov 2022 17:54:08 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing In-Reply-To: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> Message-ID: <pPw8a_Hl8_rt-kSK-Oj8y8_DevZ_2zzsOaSEMgRMeBU=.fe429115-f146-491b-b0a2-c22cf2e2193e@github.com> On Fri, 18 Nov 2022 12:30:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. > Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. Looks good to me! The issue can also be easily reproduced by running a simplified variation of test events/Exception/exception01/exception01.java plus forcing a GC safepoint in that method: https://github.com/pchilano/jdk/commit/9e61eb0e0d624608d981754d3e8b6f50a6accd96 Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.org/jdk/pull/11238 From vlivanov at openjdk.org Mon Nov 21 19:00:31 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 21 Nov 2022 19:00:31 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22] In-Reply-To: <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> Message-ID: <K3ahvJ23eDT3tc4Dn8Chwm8xXb5Xm47iWccekma2bMY=.02bdd22b-866a-4239-8105-a3219e8df259@github.com> On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > remove early return JVM part looks good. The test results look good. (Had to wait until testing is complete.) ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10582 From sspitsyn at openjdk.org Mon Nov 21 19:10:22 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 21 Nov 2022 19:10:22 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> Message-ID: <OXMwMe-mREvGBuUIpDNxLYYEV6r0AaNYMEvWpI5D140=.8fd7c7a9-537e-4506-97f9-a28451db927f@github.com> On Mon, 21 Nov 2022 15:58:31 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed a trailing white space issue > > src/hotspot/share/prims/jvmtiExport.cpp line 201: > >> 199: JvmtiVirtualThreadEventMark(JavaThread *thread) : >> 200: JvmtiEventMark(thread) { >> 201: if (thread->vthread() != NULL) { > > Can this condition ever be false? Yes, this condition can be false for platform threads. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Mon Nov 21 20:19:47 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 21 Nov 2022 20:19:47 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v5] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <M6_9j3bH1ryfow-mbyX_niBEyuSXv8gSYj44iYys36o=.ca2328d0-42ab-4518-90f5-015ae7a84555@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor update for unnamed threads in jvmti_common.h ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/8e408555..7c9c3b5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=03-04 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Mon Nov 21 20:23:25 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 21 Nov 2022 20:23:25 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> Message-ID: <1-7s2ly8ZxgzHZouAyRPsmoGCS4G6nuk2NO1fHmBGaA=.0cc5b700-b9a0-4ae2-a3b2-329c08c40607@github.com> On Mon, 21 Nov 2022 16:02:19 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed a trailing white space issue > > test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualThreadStartTest/libVirtualThreadStartTest.cpp line 54: > >> 52: started_thread_cnt++; >> 53: } >> 54: deallocate(jvmti, jni, (void*)tname); > > This will crash if `get_thread_name()` returns the string constant `"<Unnamed thread>"` Nice catch. It is strange I've never seen any related crashes. This could be addressed separately but I've fixed it now. Please, let me know if you are okay with it. Will submit more mach5 runs to be safe. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From duke at openjdk.org Mon Nov 21 21:05:39 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Nov 2022 21:05:39 GMT Subject: Integrated: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> Message-ID: <JY-BfrL1Sa8AycQ-uBdqwju80n0b8rwxvFEHQxVIyyU=.33674f8b-e1fe-4775-aeec-80dc48039d1f@github.com> On Wed, 5 Oct 2022 21:28:26 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s This pull request has now been integrated. Changeset: f12710e9 Author: Volodymyr Paprotski <volodymyr.paprotski at intel.com> Committer: Sandhya Viswanathan <sviswanathan at openjdk.org> URL: https://git.openjdk.org/jdk/commit/f12710e938b36594623e9c82961d8aa0c0ef29c2 Stats: 1860 lines in 32 files changed: 1824 ins; 3 del; 33 mod 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions Reviewed-by: sviswanathan, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sspitsyn at openjdk.org Mon Nov 21 21:50:24 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 21 Nov 2022 21:50:24 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing In-Reply-To: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> Message-ID: <_ghcyBnixrQH1t37f2zHHh48L_NvNXisvhteXFrfBJ4=.add33955-3a45-45c7-b16e-1ae02d8e711e@github.com> On Fri, 18 Nov 2022 12:30:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. > Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. Thank you for fixing this! Looks good. Added a minor comment. Thanks, Serguei src/hotspot/share/prims/jvmtiExport.cpp line 1969: > 1967: Handle exception_handle(thread, exception); > 1968: KeepStackGCProcessedMark ksgcpm(thread); > 1969: Nit: It'd be nice to place a small comment about why it is needed. ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11238 From mcimadamore at openjdk.org Mon Nov 21 21:56:01 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 21 Nov 2022 21:56:01 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v28] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <0y8JsbYwjCSOwW3GHC-7W3u4__AjjxlyQ_GZHR1YBtk=.f17cdd0e-652b-4906-a9a3-bf53e3d4768a@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: - Address more review comments - Fix bad @throws in MemorySegment::copy methods - Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/876587c3..a0cee7b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=26-27 Stats: 19 lines in 4 files changed: 8 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From rrich at openjdk.org Mon Nov 21 22:36:13 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 21 Nov 2022 22:36:13 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <OXMwMe-mREvGBuUIpDNxLYYEV6r0AaNYMEvWpI5D140=.8fd7c7a9-537e-4506-97f9-a28451db927f@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> <OXMwMe-mREvGBuUIpDNxLYYEV6r0AaNYMEvWpI5D140=.8fd7c7a9-537e-4506-97f9-a28451db927f@github.com> Message-ID: <MNH2D_ss1Odt62wBvfHpKDlkUodtM4hylDUDbCGqS-4=.109461e0-54ba-4a82-848c-49bb1c1c5509@github.com> On Mon, 21 Nov 2022 19:08:16 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 201: >> >>> 199: JvmtiVirtualThreadEventMark(JavaThread *thread) : >>> 200: JvmtiEventMark(thread) { >>> 201: if (thread->vthread() != NULL) { >> >> Can this condition ever be false? > > Yes, this condition can be false for platform threads. The [comment](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/javaThread.hpp#L88) suggests that `vthread()` cannot evaluate to NULL. Otherwise `Thread.currentThread()` would return null too. I've experimented a little bit and found that `thread->vthread()` in fact can be NULL during initialization but then `thread->threadObj()` is also not yet initialized. E.g. in `Threads::initialize_java_lang_classes()` class events are generated before `create_initial_thread()` is reached. The constructor of `JvmtiClassEventMark` calls its parent class' constructor, that is `JvmtiVirtualThreadEventMark`, with a not yet fully initialized initial thread. I found that `jtreg:test/hotspot/jtreg:hotspot_serviceability` succeeds with `assert(thread->vthread() != NULL || thread->threadObj() == NULL, "");` So you could unconditionally use `thread->vthread()`. >> test/hotspot/jtreg/serviceability/jvmti/vthread/VirtualThreadStartTest/libVirtualThreadStartTest.cpp line 54: >> >>> 52: started_thread_cnt++; >>> 53: } >>> 54: deallocate(jvmti, jni, (void*)tname); >> >> This will crash if `get_thread_name()` returns the string constant `"<Unnamed thread>"` > > Nice catch. > It is strange I've never seen any related crashes. > This could be addressed separately but I've fixed it now. > Please, let me know if you are okay with it. > Will submit more mach5 runs to be safe. The fix looks good to me. Thanks for taking care of the issue right away. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Mon Nov 21 23:18:28 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 21 Nov 2022 23:18:28 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v5] In-Reply-To: <M6_9j3bH1ryfow-mbyX_niBEyuSXv8gSYj44iYys36o=.ca2328d0-42ab-4518-90f5-015ae7a84555@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <M6_9j3bH1ryfow-mbyX_niBEyuSXv8gSYj44iYys36o=.ca2328d0-42ab-4518-90f5-015ae7a84555@github.com> Message-ID: <0HIetDbgCArYdScZZdNyQaV_evPU6Yv3lHKgBR8-TDQ=.ea0a1567-3a1e-440f-9795-68ceac253368@github.com> On Mon, 21 Nov 2022 20:19:47 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor update for unnamed threads in jvmti_common.h Alan and Richard, thank you for reviews. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From sviswanathan at openjdk.org Tue Nov 22 00:07:26 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 22 Nov 2022 00:07:26 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> References: <dVSdMEOd_hypt89L5-2Hcx56M11WYpGwsHh33lHgxbY=.0a3e0288-8498-4166-b40b-e9851222ad64@github.com> <pW8HHmh-dQnjrRftd052fBQVulqe_z5KldY76Hp-OmI=.a7a2b6e6-a182-45d2-bd11-16a406126482@github.com> Message-ID: <lkdAp1le7HB7nOOHfadBJiXFWUU35nWKGfPL4JkPrLY=.59a05155-8ddd-4f46-add8-d423c4dcf86d@github.com> On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad <redestad at openjdk.org> wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Missing & 0xff in StringLatin1::hashCode We have seen this as a hotspot in workloads. It will be good to optimize the StringUTF16 and StringLatin1 hash code computation. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dholmes at openjdk.org Tue Nov 22 00:45:57 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Nov 2022 00:45:57 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22] In-Reply-To: <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> Message-ID: <PLZFHnt40pDDB__eeQp57j8KTKs1CmndGhvQCQlbiiY=.55f9fa6d-7716-4cac-9a3a-6b825455ea48@github.com> On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > remove early return Testing is broken: test/jdk/sun/security/util/math/BigIntegerModuloP.java:160: error: BigIntegerModuloP.ImmutableElement is not abstract and does not override abstract method getLimbs() in IntegerModuloP private class ImmutableElement extends Element Did you forget to commit a test file? I will file a new bug for this. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From vlivanov at openjdk.org Tue Nov 22 00:52:25 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 22 Nov 2022 00:52:25 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v7] In-Reply-To: <Nzi7QqlRzEE_tu6l1g54E7c5BAGIg9tSyaR4nk_fr8E=.df313bdb-84b6-4ff4-b3b5-22ed34a26b2c@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <Nzi7QqlRzEE_tu6l1g54E7c5BAGIg9tSyaR4nk_fr8E=.df313bdb-84b6-4ff4-b3b5-22ed34a26b2c@github.com> Message-ID: <8WIJU705KWrVpK3zR1FhwrrpqSbxiUUm_aJaRR0Zdzk=.1594761d-bc52-4968-a4ca-262a2a7cb55b@github.com> On Fri, 18 Nov 2022 14:54:52 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > 8296973: saving errno on a value-returning function crashes the JVM > > Reviewed-by: mcimadamore Unless I'm missing something important, `vmstorage.inline.hpp`/`vmstorage_<cpu>.inline.hpp` should be named `vmstorage.hpp`/`vmstorage_<cpu>.hpp`. Otherwise, looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/11019 From vlivanov at openjdk.org Tue Nov 22 00:53:27 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 22 Nov 2022 00:53:27 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v4] In-Reply-To: <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> Message-ID: <Lqk3lBEBV7QKGZ2IhKVF0ekFlGOypcnAydHG_y1lZlE=.6199a3af-55f3-4c6c-bed5-4448e83faeb6@github.com> On Mon, 21 Nov 2022 06:28:32 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > Pull out common macro code into function parameter pack Looks good. src/hotspot/cpu/x86/stubGenerator_x86_64_chacha.cpp line 107: > 105: if (VM_Version::supports_evex()) { > 106: StubRoutines::_chacha20Block = generate_chacha20Block_avx512(); > 107: } else { // Either AVX or AVX2 is supported Worth to supplement the comment with an assert (either `UseAVX > 0` or `VM_Version::supports_avx() == true`). ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/7702 From jvernee at openjdk.org Tue Nov 22 01:29:58 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 22 Nov 2022 01:29:58 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v8] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <FUBlx8-FuuwWVr-mTQFfrxgNpOTdoasTS6Ko_hUqnsw=.0635ea9a-b7a0-4d2f-955d-8e49b589c6d5@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: drop .inline from vmstorage header names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/0fa0e8cf..03be64c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=06-07 Stats: 7 lines in 11 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From jvernee at openjdk.org Tue Nov 22 01:29:59 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 22 Nov 2022 01:29:59 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v7] In-Reply-To: <Nzi7QqlRzEE_tu6l1g54E7c5BAGIg9tSyaR4nk_fr8E=.df313bdb-84b6-4ff4-b3b5-22ed34a26b2c@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> <Nzi7QqlRzEE_tu6l1g54E7c5BAGIg9tSyaR4nk_fr8E=.df313bdb-84b6-4ff4-b3b5-22ed34a26b2c@github.com> Message-ID: <Pel5bspp9jSa_mZCIjiXwFdR6OM6VeULKxNET_iiS6g=.091034b1-faae-44b9-be32-22dfc76c472e@github.com> On Fri, 18 Nov 2022 14:54:52 GMT, Jorn Vernee <jvernee at openjdk.org> wrote: >> Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the following patches: >> >> 1. https://github.com/openjdk/panama-foreign/pull/698 >> 2. https://github.com/openjdk/panama-foreign/pull/699 >> 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 >> 4. https://github.com/openjdk/panama-foreign/pull/740 >> 5. https://github.com/openjdk/panama-foreign/pull/746 >> 6. https://github.com/openjdk/panama-foreign/pull/742 >> 7. https://github.com/openjdk/panama-foreign/pull/743 >> >> Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. >> >> The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. >> >> Please refer to the PR of each individual patch for a more detailed description. > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > 8296973: saving errno on a value-returning function crashes the JVM > > Reviewed-by: mcimadamore Thanks for the review, I've dropped the `.inline` from the header file names. ------------- PR: https://git.openjdk.org/jdk/pull/11019 From dholmes at openjdk.org Tue Nov 22 02:30:22 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Nov 2022 02:30:22 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing In-Reply-To: <_ghcyBnixrQH1t37f2zHHh48L_NvNXisvhteXFrfBJ4=.add33955-3a45-45c7-b16e-1ae02d8e711e@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> <_ghcyBnixrQH1t37f2zHHh48L_NvNXisvhteXFrfBJ4=.add33955-3a45-45c7-b16e-1ae02d8e711e@github.com> Message-ID: <DCBR7PSnZmyDLZYz6PerijTA5AXEfaTE56IjgfyG_OA=.f24a3e3e-0da2-4f08-bebe-a26a3c208704@github.com> On Mon, 21 Nov 2022 21:44:14 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. >> Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. > > src/hotspot/share/prims/jvmtiExport.cpp line 1969: > >> 1967: Handle exception_handle(thread, exception); >> 1968: KeepStackGCProcessedMark ksgcpm(thread); >> 1969: > > Nit: It'd be nice to place a small comment about why it is needed. +1 I've never even heard of this thing let alone understand when I would need to use it. ------------- PR: https://git.openjdk.org/jdk/pull/11238 From duke at openjdk.org Tue Nov 22 03:05:19 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 22 Nov 2022 03:05:19 GMT Subject: RFR: 8296263: Uniform APIs for using archived heap regions In-Reply-To: <O6qz1pX6yNTe8awXHtS1pX_z5PYp40KZcgO-pBDStcg=.05245d89-c76d-4102-ba18-7a9a2a48001a@github.com> References: <3yfa0M_ZNG6oyLFj9qM9JYXyX-qzusaHw7R54wddmbE=.22a4a865-bb12-4d17-9d6a-cf95e2cc430f@github.com> <TRoDLfcCCxNIGwWPb4W1eJtT7BAp2zjZhFjvSH0aleM=.1fa97bf3-ec5c-464b-86a1-7d320b1f1178@github.com> <naoq1y802BM2oWDp-qLDcaGWfHUx9egRUNgeNXoFuOM=.3a96c815-ddfb-4b32-8d35-29de94db7295@github.com> <E4vmiApqmu80hBu0GrQlPdpoJQt-HJellO_d_vWMKYo=.42ab8128-bfae-4c02-961a-196f625c327d@github.com> <O6qz1pX6yNTe8awXHtS1pX_z5PYp40KZcgO-pBDStcg=.05245d89-c76d-4102-ba18-7a9a2a48001a@github.com> Message-ID: <ksdNsvjIsqAxL1fWmh9dQvoLJa8GAxkRYlCwpn1xwzg=.fd5f3c38-fb4d-40b6-aa44-9a757ae64170@github.com> On Mon, 21 Nov 2022 06:51:37 GMT, Ioi Lam <iklam at openjdk.org> wrote: > I lots my old build so I rebuilt from version [565e6ff](https://github.com/openjdk/jdk/pull/10970/commits/565e6ffd68d67a94a3ffff76734005492b98dd9d) and I couldn't reproduce the problem anymore. Maybe I had a glitch in my build. Sorry for the noise. @iklam no worries, thanks for checking again and clarifying it up. ------------- PR: https://git.openjdk.org/jdk/pull/10970 From sspitsyn at openjdk.org Tue Nov 22 05:15:01 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 05:15:01 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v6] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <uOdkRvSOkER6wKrIGIlmSUzt7goT9NjQLrjkZDtBLjU=.9dda8f1f-2f82-48b3-b004-94b1130432a7@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/7c9c3b5b..1a8f9810 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=04-05 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Tue Nov 22 05:16:46 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 05:16:46 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v4] In-Reply-To: <MNH2D_ss1Odt62wBvfHpKDlkUodtM4hylDUDbCGqS-4=.109461e0-54ba-4a82-848c-49bb1c1c5509@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <kUAPPkMK_0zkn4Sx_sINzMaGnhDkB6PA6uotYbI2400=.ed92e918-d43c-4f15-8254-161a9f2b23a9@github.com> <hJlCt5GjiPj1Fmu6RWbS71FY2hSKMSJcIXCanrre3po=.18ee6867-8251-45ab-8bb3-9fa2629111c3@github.com> <OXMwMe-mREvGBuUIpDNxLYYEV6r0AaNYMEvWpI5D140=.8fd7c7a9-537e-4506-97f9-a28451db927f@github.com> <MNH2D_ss1Odt62wBvfHpKDlkUodtM4hylDUDbCGqS-4=.109461e0-54ba-4a82-848c-49bb1c1c5509@github.com> Message-ID: <min-Y5EQiCklK5yhtcRQbiyh2PxfSywtUR6fFxXAXd0=.0a25c9cf-6c6e-4b45-a752-a383cdd36fea@github.com> On Mon, 21 Nov 2022 22:30:28 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Yes, this condition can be false for platform threads. > > The [comment](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/javaThread.hpp#L88) suggests that `vthread()` cannot evaluate to NULL. Otherwise `Thread.currentThread()` would return null too. > > I've experimented a little bit and found that `thread->vthread()` in fact can be NULL during initialization but then `thread->threadObj()` is also not yet initialized. E.g. in `Threads::initialize_java_lang_classes()` class events are generated before `create_initial_thread()` is reached. The constructor of `JvmtiClassEventMark` calls its parent class' constructor, that is `JvmtiVirtualThreadEventMark`, with a not yet fully initialized initial thread. > > I found that `jtreg:test/hotspot/jtreg:hotspot_serviceability` succeeds with `assert(thread->vthread() != NULL || thread->threadObj() == NULL, "");` > So you could unconditionally use `thread->vthread()`. Got it, thanks. Removed the check for NULL and added the assert. The simplification is minor but still worth it. Please, let me know if you are okay with the update. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From jnimeh at openjdk.org Tue Nov 22 05:28:05 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 22 Nov 2022 05:28:05 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v5] In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <7-omMJqslZjYB-nYcKyasj5bkNSDyCTXyj53yg7hpPI=.d9b796da-cf58-4591-8767-72beb67faf1e@github.com> > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Jamil Nimeh has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge with main - Add AVX assertion guard - Pull out common macro code into function parameter pack - replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations - Change intrinsic helper method name conform to convention - consolidate chacha macroAssembler routines into chacha stubGenerator file - More indentation fixes on aarch64 - rename chapoly->chacha for macro file - rename chacha macro file to be consistent with x86_64 naming - Fix indentation issues - ... and 40 more: https://git.openjdk.org/jdk/compare/392ac705...bb3f4264 ------------- Changes: https://git.openjdk.org/jdk/pull/7702/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=04 Stats: 1593 lines in 28 files changed: 1555 ins; 4 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Tue Nov 22 05:29:56 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 22 Nov 2022 05:29:56 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v4] In-Reply-To: <Lqk3lBEBV7QKGZ2IhKVF0ekFlGOypcnAydHG_y1lZlE=.6199a3af-55f3-4c6c-bed5-4448e83faeb6@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <7dgYny7behfzrurUf7PH-jfhKYdDIMV4hInwIlZwg-Y=.9a97cf5b-fe58-4855-911b-43ec710c9539@github.com> <Lqk3lBEBV7QKGZ2IhKVF0ekFlGOypcnAydHG_y1lZlE=.6199a3af-55f3-4c6c-bed5-4448e83faeb6@github.com> Message-ID: <d44txx0W_WbncibBXz1b87LUpdop3VWPu3Cko5Os9O0=.08bb0ff3-45bf-457b-bd83-69395f85ef18@github.com> On Mon, 21 Nov 2022 19:06:49 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: >> >> Pull out common macro code into function parameter pack > > src/hotspot/cpu/x86/stubGenerator_x86_64_chacha.cpp line 107: > >> 105: if (VM_Version::supports_evex()) { >> 106: StubRoutines::_chacha20Block = generate_chacha20Block_avx512(); >> 107: } else { // Either AVX or AVX2 is supported > > Worth to supplement the comment with an assert (either `UseAVX > 0` or `VM_Version::supports_avx() == true`). Good catch. This has been implemented. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From sspitsyn at openjdk.org Tue Nov 22 05:57:37 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 05:57:37 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v7] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into br19 Merge - removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor - minor update for unnamed threads in jvmti_common.h - fixed a trailing white space issue - extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp - roll back unintended VirtualThread.java file update - simplified VirtualThreadStartTest - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/1a8f9810..7c659909 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=05-06 Stats: 18790 lines in 156 files changed: 6558 ins; 2951 del; 9281 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From dholmes at openjdk.org Tue Nov 22 06:39:40 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Nov 2022 06:39:40 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v3] In-Reply-To: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> Message-ID: <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> > This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. > > Testing: tiers 1-3 > > Thanks. David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Manpage update - Merge branch 'master' into 8297106-ensure-local-capacity - Removed additional test that no longer applies. - Forgot to commit deleted test file. - Forgot to commit removed test. - 8297106: Remove the -Xcheck:jni local reference capacity checking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11259/files - new: https://git.openjdk.org/jdk/pull/11259/files/3dd0ec0f..1241f503 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11259&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11259&range=01-02 Stats: 22214 lines in 223 files changed: 9254 ins; 3138 del; 9822 mod Patch: https://git.openjdk.org/jdk/pull/11259.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11259/head:pull/11259 PR: https://git.openjdk.org/jdk/pull/11259 From rrich at openjdk.org Tue Nov 22 07:45:55 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 22 Nov 2022 07:45:55 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v7] In-Reply-To: <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> Message-ID: <PqT4lMZmekJGd5qxFhi2L9fZ_8pYjlAmITCZFp8iQQI=.ac5af7e5-cf32-4ac0-9a1e-2ea0e4377402@github.com> On Tue, 22 Nov 2022 05:57:37 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into br19 > Merge > - removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor > - minor update for unnamed threads in jvmti_common.h > - fixed a trailing white space issue > - extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp > - roll back unintended VirtualThread.java file update > - simplified VirtualThreadStartTest > - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM Changes look good to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/11246 From kbarrett at openjdk.org Tue Nov 22 08:05:27 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 22 Nov 2022 08:05:27 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v12] In-Reply-To: <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> Message-ID: <NZslB0NAOVhw3IKe-_JWMhfiSJb4p52wHnftlPsz86E=.44faae07-e7c6-4810-aab7-8019c8808c8a@github.com> On Fri, 18 Nov 2022 19:25:32 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > extra sizeof typo Given all the near-duplicated checking of os::snprintf results, I think there is a place for a helper function to package this up. Maybe something like // in class os // Performs snprintf and asserts the result is non-negative (so there was not // an encoding error) and that the output was not truncated. static int snprintf_checked(char* buf, size_t len, const char* fmt, ...) ATTRIBUTE_PRINTF(3, 4); // in runtime/os.cpp int os::snprintf_checked(char* buf, size_t len, const char* fmt, ...) { va_list args; va_start(args, fmt); int result = os::vsnprintf(buf, len, fmt, args); va_end(args); assert(result >= 0, "os::snprintf error"); assert(static_cast<size_t>(result) < size, "os::snprintf truncated"); return result; } (I keep waffling over whether the truncation check should be an assert or a guarantee.) I've not yet gone through all the changes yet to consider which should do that checking and which should do something different, such as permitting truncation. I'm not wedded to that name; indeed, I don't like it that much, as it's kind of inconveniently long. There's a temptation to have os::snprintf forbid truncation and a different function that allows it, but that would require careful auditing of all pre-existing uses of os::snprintf too, so no. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From sspitsyn at openjdk.org Tue Nov 22 08:18:24 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 08:18:24 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v7] In-Reply-To: <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> Message-ID: <ZD6SkJ6MSGGoH9RYcLYDdcjklvd8Yi0b6UgcSirVwuc=.ee7021d8-c46c-4dfb-b850-b806828f0590@github.com> On Tue, 22 Nov 2022 05:57:37 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into br19 > Merge > - removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor > - minor update for unnamed threads in jvmti_common.h > - fixed a trailing white space issue > - extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp > - roll back unintended VirtualThread.java file update > - simplified VirtualThreadStartTest > - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM Thank you, Richard. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From mcimadamore at openjdk.org Tue Nov 22 14:48:04 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 22 Nov 2022 14:48:04 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v29] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <bagsYNxX14Px2iI8CRdc2mo4M11QwmnnWJPljCH-gkI=.29852324-8631-4014-8f9f-2a35558ff0d3@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix wrong check in MemorySegment::spliterator/elements (The check which ensures that the segment size is multiple of spliterator element size is bogus) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/a0cee7b0..66dd888d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=27-28 Stats: 29 lines in 2 files changed: 21 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From thartmann at openjdk.org Tue Nov 22 15:25:42 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Nov 2022 15:25:42 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <EseUb0cgdeigmu9nNWflTuRMLKo6T0nEruj3TaPqfYQ=.bedc9bf5-5f1f-48be-8846-0064a88036a0@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> <EseUb0cgdeigmu9nNWflTuRMLKo6T0nEruj3TaPqfYQ=.bedc9bf5-5f1f-48be-8846-0064a88036a0@github.com> Message-ID: <K2ZrVCTZ9ZPdbTZRncB_cBAB_PoluPuy0bwzMTmWqZQ=.37ea58fb-e6f0-4a4b-9742-c59c84ac2fb9@github.com> On Mon, 21 Nov 2022 17:42:28 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Overall, looks good. Just one minor cleanup suggestion. >> >> I've submitted the latest patch for testing (hs-tier1 - hs-tier4). > > @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the latest patch again?) I fixed the test issue with [JDK-8297382](https://bugs.openjdk.org/browse/JDK-8297382) but this also caused a regression with one of the crypto tests: [JDK-8297417](https://bugs.openjdk.org/browse/JDK-8297417). @vpaprotsk, @sviswa7 could you please have a look at this? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Tue Nov 22 15:30:45 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 22 Nov 2022 15:30:45 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21] In-Reply-To: <K2ZrVCTZ9ZPdbTZRncB_cBAB_PoluPuy0bwzMTmWqZQ=.37ea58fb-e6f0-4a4b-9742-c59c84ac2fb9@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <oaQBTRrtslpMJcgFY4XfZBse2Vgo9p1EMqvDzE2fFj8=.a7e037a5-2a7c-4f1b-b846-6f6f6394b21e@github.com> <Bt4UNZU2itTeHs_2ojFCD64AXpGPiI8gveUtRg5mea0=.2926b137-f31e-4505-9a96-815e4f5ab851@github.com> <EseUb0cgdeigmu9nNWflTuRMLKo6T0nEruj3TaPqfYQ=.bedc9bf5-5f1f-48be-8846-0064a88036a0@github.com> <K2ZrVCTZ9ZPdbTZRncB_cBAB_PoluPuy0bwzMTmWqZQ=.37ea58fb-e6f0-4a4b-9742-c59c84ac2fb9@github.com> Message-ID: <Mneyy4gDLmypoILIjA9Tv800aYfLdoJ-mEBrq5egvXg=.9aa50db3-9eeb-436b-9f60-a3807023edd1@github.com> On Tue, 22 Nov 2022 15:21:44 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the latest patch again?) > > I fixed the test issue with [JDK-8297382](https://bugs.openjdk.org/browse/JDK-8297382) but this also caused a regression with one of the crypto tests: [JDK-8297417](https://bugs.openjdk.org/browse/JDK-8297417). @vpaprotsk, @sviswa7 could you please have a look at this? @TobiHartmann @dholmes-ora Sorry about that, looking ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sgehwolf at openjdk.org Tue Nov 22 15:55:20 2022 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 22 Nov 2022 15:55:20 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> Message-ID: <F5MuXpAPEb_Bqj1FrkNaf5wwkHZdc38xIjOLzUq5TX0=.15f21c71-3048-4422-ba1e-9a116f693f21@github.com> On Mon, 14 Nov 2022 20:19:29 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: > Please review this addition to the jdk.ContainerConfigration event which adds information > about the container host. Specifically, the total amount of memory of the host system. > > Testing: > - [x] New test case (passed, fails before) > - [x] JFR tests. Passed. > > Thoughts? Ping? Anyone willing to review this? ------------- PR: https://git.openjdk.org/jdk/pull/11143 From aph at openjdk.org Tue Nov 22 16:56:04 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Nov 2022 16:56:04 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v17] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <VQ5f0Bk-vm2h40aZkzFRs8QP2h1KHBjQ9oKy1753LWk=.06383738-b427-4d2f-b4a9-d0c81df56356@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Fix bad merge. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/86ce5bbd..04320c7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=15-16 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Tue Nov 22 17:20:56 2022 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 22 Nov 2022 17:20:56 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22] In-Reply-To: <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> References: <wDtmoM8mMKTxF31fFaHywCrQgFOV1wrL5wCV4ytlrEg=.9ae8ecbf-9386-4aee-9764-2ebafb541e07@github.com> <encTlnf9qtjfjtVa-jDoWJMcUc6AwRtSDj7tk_OyBM0=.9728a3c6-6009-4873-9cb3-28ac8c262282@github.com> Message-ID: <pyx9hvqbMbB8kvwH1WaWugTgbOjOqSmWCIooIeST7CY=.add869f5-b444-4ef9-8513-53902c7b040c@github.com> On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski <duke at openjdk.org> wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > remove early return @robcasloz Update to [JDK-8297417](https://bugs.openjdk.org/browse/JDK-8297417) (since I don't have an account on the bugtracker yet to update there) Not able to reproduce it on Linux yet. The seed should make it deterministic.. but nothing. Resurrecting's my windows sandbox to see if I can reproduce on windows (only difference on windows is the intrinsic function register linkage. However problem there would make the problem _very_ deterministic.. I think) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From aph at openjdk.org Tue Nov 22 17:23:59 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Nov 2022 17:23:59 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v18] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <faIMl3qGdVtkP66lVoSSGFAEmB7dpu9V_s3S0kUfBws=.58dcceca-b628-46b1-a6af-991d1a1b7a48@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Remove incorrect assertion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/04320c7b..afad922c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=16-17 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 22 17:27:14 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Nov 2022 17:27:14 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v19] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <qmpcPBwjde58ASawZ5-8u7QrrLe8fMDumeFeSGzp7GE=.edca51ee-aae6-4ad7-905a-0080671ee30f@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/aarch64/aarch64.ad Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/afad922c..7c49e676 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 22 17:31:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Nov 2022 17:31:05 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v20] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <VIPgCOyHL1A1zGtBaTIJ7njWten6tHdKyQvObYRbTBI=.b1fa69ee-fa81-423b-885e-307ee730a478@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/Thread.java Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/7c49e676..b06ea927 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=18-19 Stats: 8 lines in 1 file changed: 5 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Tue Nov 22 17:34:22 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Nov 2022 17:34:22 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v21] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <P3uXRY2QdSGl5AGCu_UxBxV_sipWpZXxr_k4uaTRSlE=.390bd458-4e8f-40dd-9f62-5b64cedff5f3@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Update src/java.base/share/classes/java/lang/VirtualThread.java Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> - Feedback from reviewers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/b06ea927..dc577736 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=19-20 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From mdoerr at openjdk.org Tue Nov 22 18:19:44 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Nov 2022 18:19:44 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values Message-ID: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). ------------- Commit messages: - 8297445: PPC64: Represent Registers as values Changes: https://git.openjdk.org/jdk/pull/11297/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11297&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297445 Stats: 814 lines in 13 files changed: 163 ins; 429 del; 222 mod Patch: https://git.openjdk.org/jdk/pull/11297.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11297/head:pull/11297 PR: https://git.openjdk.org/jdk/pull/11297 From dcubed at openjdk.org Tue Nov 22 20:59:47 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 22 Nov 2022 20:59:47 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v7] In-Reply-To: <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> Message-ID: <wRR8h_AMmgT51OcdT5g0g9aFdGiS1ieLd7Mety95hTw=.8e6171d0-9f85-44b6-b129-e2355ab9b5ab@github.com> On Tue, 22 Nov 2022 05:57:37 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> The can_support_virtual_thread was initially implemented as an onload capability. >> It is why this capability does not work for the agents loaded into running VM. >> The fix is to move it from `onload` to `always`capabilities list. >> >> Testing: >> New test is added: VirtualStartThreadTest. >> TBD: mach5 jvmti, jdi and tier1-6 tests. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into br19 > Merge > - removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor > - minor update for unnamed threads in jvmti_common.h > - fixed a trailing white space issue > - extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp > - roll back unintended VirtualThread.java file update > - simplified VirtualThreadStartTest > - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM src/hotspot/share/prims/jvmtiExport.cpp line 202: > 200: JvmtiEventMark(thread) { > 201: _jthread = to_jobject(thread->vthread()); > 202: assert(thread->vthread() != NULL || thread->threadObj() == NULL, "sanity check"); Seems a little strange to me that L202 is after L201. You're asserting that `thread->vthread() != NULL` after passing it to a `to_jobject()` call. ------------- PR: https://git.openjdk.org/jdk/pull/11246 From dcubed at openjdk.org Tue Nov 22 22:28:27 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 22 Nov 2022 22:28:27 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v3] In-Reply-To: <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> Message-ID: <b3atmqI5IZYyk0zGgkg-EKHp0puDXmyf-GPEPNhdRbE=.37444c02-0fbb-46ee-80a9-e6d6068f329e@github.com> On Tue, 22 Nov 2022 06:39:40 GMT, David Holmes <dholmes at openjdk.org> wrote: >> This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. >> >> Testing: tiers 1-3 >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Manpage update > - Merge branch 'master' into 8297106-ensure-local-capacity > - Removed additional test that no longer applies. > - Forgot to commit deleted test file. > - Forgot to commit removed test. > - 8297106: Remove the -Xcheck:jni local reference capacity checking Thumbs up. The removal of -Xcheck:jni local reference capacity checking appears to be clean and easy to review. The hard part is convincing yourself that there isn't a way to get something useful out of this idea. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/11259 From dholmes at openjdk.org Tue Nov 22 22:35:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Nov 2022 22:35:37 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v3] In-Reply-To: <b3atmqI5IZYyk0zGgkg-EKHp0puDXmyf-GPEPNhdRbE=.37444c02-0fbb-46ee-80a9-e6d6068f329e@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> <b3atmqI5IZYyk0zGgkg-EKHp0puDXmyf-GPEPNhdRbE=.37444c02-0fbb-46ee-80a9-e6d6068f329e@github.com> Message-ID: <ZtrPJdhQfqpNyIoOgCKxML_GBGYo6rIvV-rAjED_C38=.c4054020-1bba-43d3-80b4-e5654e1c7ede@github.com> On Tue, 22 Nov 2022 22:24:31 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Manpage update >> - Merge branch 'master' into 8297106-ensure-local-capacity >> - Removed additional test that no longer applies. >> - Forgot to commit deleted test file. >> - Forgot to commit removed test. >> - 8297106: Remove the -Xcheck:jni local reference capacity checking > > Thumbs up. The removal of -Xcheck:jni local reference capacity checking > appears to be clean and easy to review. > > The hard part is convincing yourself that there isn't a way to get > something useful out of this idea. Thanks @dcubed-ojdk > The hard part is convincing yourself that there isn't a way to get something useful out of this idea. Yep totally agree. I kept trying but couldn't come up with anything useful. Ran it by some other folk and they agreed. All this feature seems to do is introduce periodic new test failures (across all releases). ------------- PR: https://git.openjdk.org/jdk/pull/11259 From sspitsyn at openjdk.org Tue Nov 22 23:19:30 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 23:19:30 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v7] In-Reply-To: <wRR8h_AMmgT51OcdT5g0g9aFdGiS1ieLd7Mety95hTw=.8e6171d0-9f85-44b6-b129-e2355ab9b5ab@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> <80rNrQw5nENgWF8B1AMpS9DTjCEqttpX-O86Hn4HtL8=.034efc1e-6582-4bb6-ab99-39040d2951b7@github.com> <wRR8h_AMmgT51OcdT5g0g9aFdGiS1ieLd7Mety95hTw=.8e6171d0-9f85-44b6-b129-e2355ab9b5ab@github.com> Message-ID: <NXgyx_fx9P4bqrhbU2EdmOKA5Ca_lr7f5f0Wfl0r7nU=.d8670958-96e7-4579-9769-71215a4deb71@github.com> On Tue, 22 Nov 2022 20:57:16 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote: >> Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into br19 >> Merge >> - removed thread->vthread() != NULL from JvmtiVirtualThreadEventMark constructor >> - minor update for unnamed threads in jvmti_common.h >> - fixed a trailing white space issue >> - extended VirtualThreadStartTest to support more configs; fixed issue in jvmtiExport.cpp >> - roll back unintended VirtualThread.java file update >> - simplified VirtualThreadStartTest >> - 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM > > src/hotspot/share/prims/jvmtiExport.cpp line 202: > >> 200: JvmtiEventMark(thread) { >> 201: _jthread = to_jobject(thread->vthread()); >> 202: assert(thread->vthread() != NULL || thread->threadObj() == NULL, "sanity check"); > > Seems a little strange to me that L202 is after L201. You're asserting > that `thread->vthread() != NULL` after passing it to a `to_jobject()` call. I do not think this matters. But I will revert the order of these line to make you happy. :) Thank you for looking at the fix! ------------- PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Tue Nov 22 23:27:56 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 23:27:56 GMT Subject: RFR: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM [v8] In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <TVz7k9b0LO_s8TCH7eGbsq6dE16S1HcyQ3FlonST0JM=.d0ff8f70-ed6b-4a92-a130-59f86791a2e4@github.com> > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: move assert in JvmtiVirtualThreadEventMark one line up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11246/files - new: https://git.openjdk.org/jdk/pull/11246/files/7c659909..321427c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11246&range=06-07 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11246.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11246/head:pull/11246 PR: https://git.openjdk.org/jdk/pull/11246 From sspitsyn at openjdk.org Tue Nov 22 23:49:41 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 22 Nov 2022 23:49:41 GMT Subject: Integrated: 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM In-Reply-To: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> References: <1I_7hZCOwnCc-sMv2nqxC2_J6RJMFrvoepKYPRFQnFs=.2149fccd-7da2-4253-b569-72980d4b30a1@github.com> Message-ID: <4-ppaFurDcYdmOmP2EGt7iGblAPgdaqDAaroDCpG1Ps=.05e984c6-efb8-4865-a75a-de3f47b2d2b8@github.com> On Sat, 19 Nov 2022 07:08:38 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > The can_support_virtual_thread was initially implemented as an onload capability. > It is why this capability does not work for the agents loaded into running VM. > The fix is to move it from `onload` to `always`capabilities list. > > Testing: > New test is added: VirtualStartThreadTest. > TBD: mach5 jvmti, jdi and tier1-6 tests. This pull request has now been integrated. Changeset: e661c5a3 Author: Serguei Spitsyn <sspitsyn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/e661c5a3d0c8683043e238b669ae1bc59d94a682 Stats: 228 lines in 5 files changed: 220 ins; 5 del; 3 mod 8296323: JVMTI can_support_virtual_threads not available for agents loaded into running VM Reviewed-by: alanb, rrich ------------- PR: https://git.openjdk.org/jdk/pull/11246 From amenkov at openjdk.org Tue Nov 22 23:58:27 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 22 Nov 2022 23:58:27 GMT Subject: Integrated: 8296265: Use modern HTML in the JVMTI spec In-Reply-To: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> References: <ldnGThG9EamROci--Rz6dG6hqX0AGeOvSq-uXowXyLI=.e221ad5f-d843-425d-be7f-6db879b118db@github.com> Message-ID: <FoiBy_HveUlHuVqqpzujExqNDXhBKtubQKuv3ZeJcP4=.232976cd-1e4c-4ec7-9848-6822933fdbfd@github.com> On Fri, 11 Nov 2022 00:43:33 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > Changes: > - removed `<b>` from TOC; > - added CSS style for TOC (to simplify customization, currently it's empty); > - removed `<b>` from from function list (per Phase); > - removed `<b>` from from list of events; > - introduced CSS style for bold text, replaced `<b>` tags with `<span class="bold">`; > - update transformation rule for `"b"` elements to use `"span class=bold"` (to handle `<b>` tags in source XML file); > - dropped duplicate `"b"` transform. This pull request has now been integrated. Changeset: 09f70dad Author: Alex Menkov <amenkov at openjdk.org> URL: https://git.openjdk.org/jdk/commit/09f70dad2fe3f0691afacded6c38f61fa8a0d28d Stats: 53 lines in 1 file changed: 2 ins; 16 del; 35 mod 8296265: Use modern HTML in the JVMTI spec Reviewed-by: sspitsyn, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/11099 From sspitsyn at openjdk.org Wed Nov 23 00:32:53 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Nov 2022 00:32:53 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 Message-ID: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> This problem has two sides. One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` value has been set to `true` when an agent library is loaded into running VM. The fix is to get rid of this cashing. Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. Testing: The originally failed tests are passed now: runtime/vthread/RedefineClass.java runtime/vthread/TestObjectAllocationSampleEvent.java In progress: Run the tiers 1-6 to make sure there are no regression. ------------- Commit messages: - 8297286: runtime/vthread tests crashing after JDK-8296324 Changes: https://git.openjdk.org/jdk/pull/11304/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11304&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297286 Stats: 6 lines in 3 files changed: 1 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11304.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11304/head:pull/11304 PR: https://git.openjdk.org/jdk/pull/11304 From lmesnik at openjdk.org Wed Nov 23 02:21:10 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 23 Nov 2022 02:21:10 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 In-Reply-To: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> Message-ID: <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> On Wed, 23 Nov 2022 00:24:28 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > This problem has two sides. > One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. > It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` > value has been set to `true` when an agent library is loaded into running VM. > The fix is to get rid of this cashing. > Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. > Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. > The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. > > Testing: > The originally failed tests are passed now: > > runtime/vthread/RedefineClass.java > runtime/vthread/TestObjectAllocationSampleEvent.java > > In progress: > Run the tiers 1-6 to make sure there are no regression. Marked as reviewed by lmesnik (Reviewer). src/java.base/share/classes/java/lang/VirtualThread.java line 273: > 271: private void run(Runnable task) { > 272: assert state == RUNNING; > 273: boolean notifyJvmti = notifyJvmtiEvents; Don't we have same issue in yieldContinuation() method? (line 396) ------------- PR: https://git.openjdk.org/jdk/pull/11304 From lmesnik at openjdk.org Wed Nov 23 02:24:28 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 23 Nov 2022 02:24:28 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 In-Reply-To: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> Message-ID: <Ge3Ki5__vCNskNS6dBH6WTnVRclpVLqR-BqjACV-XdM=.be14ae76-019b-47a8-9914-95914acfc12e@github.com> On Wed, 23 Nov 2022 00:24:28 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > This problem has two sides. > One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. > It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` > value has been set to `true` when an agent library is loaded into running VM. > The fix is to get rid of this cashing. > Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. > Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. > The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. > > Testing: > The originally failed tests are passed now: > > runtime/vthread/RedefineClass.java > runtime/vthread/TestObjectAllocationSampleEvent.java > > In progress: > Run the tiers 1-6 to make sure there are no regression. Needed to check yieldContinuation() method for same issue. ------------- Changes requested by lmesnik (Reviewer). PR: https://git.openjdk.org/jdk/pull/11304 From dholmes at openjdk.org Wed Nov 23 02:30:33 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Nov 2022 02:30:33 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <F5MuXpAPEb_Bqj1FrkNaf5wwkHZdc38xIjOLzUq5TX0=.15f21c71-3048-4422-ba1e-9a116f693f21@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> <F5MuXpAPEb_Bqj1FrkNaf5wwkHZdc38xIjOLzUq5TX0=.15f21c71-3048-4422-ba1e-9a116f693f21@github.com> Message-ID: <_LKZlozpVxgccKWpWjQQ3c9zdM_EIKpQZyPM28MkUTo=.a1d715f0-83fa-42ee-b661-4ec64e3abf7b@github.com> On Tue, 22 Nov 2022 15:52:46 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this addition to the jdk.ContainerConfigration event which adds information >> about the container host. Specifically, the total amount of memory of the host system. >> >> Testing: >> - [x] New test case (passed, fails before) >> - [x] JFR tests. Passed. >> >> Thoughts? > > Ping? Anyone willing to review this? @jerboaa your original RFR email didn't go to hotspot-jfr-dev so this has only just appeared on the jfr mailing list. ------------- PR: https://git.openjdk.org/jdk/pull/11143 From svkamath at openjdk.org Wed Nov 23 04:30:14 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 23 Nov 2022 04:30:14 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" Message-ID: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" ------------- Commit messages: - Updated code to fix windows build issue - Removed test from ProblemList-Xcomp file - Fix for JDK-829531 Changes: https://git.openjdk.org/jdk/pull/11301/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11301&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295351 Stats: 19 lines in 2 files changed: 12 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/11301.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11301/head:pull/11301 PR: https://git.openjdk.org/jdk/pull/11301 From svkamath at openjdk.org Wed Nov 23 04:30:15 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 23 Nov 2022 04:30:15 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <hxEaxA2hjYIXr0vfIcVfyAcfoK9RYpVH8URLeVkElSk=.0ca3b358-29f1-49ae-9f9e-e76a053280d5@github.com> On Tue, 22 Nov 2022 21:52:59 GMT, Smita Kamath <svkamath at openjdk.org> wrote: > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" Hi All, I have updated f2hf and hf2f methods in sharedRuntime.cpp as a fix for the error unexpected result of converting. Kindly review this patch and provide feedback. Thank you. Regards, Smita ------------- PR: https://git.openjdk.org/jdk/pull/11301 From duke at openjdk.org Wed Nov 23 04:30:16 2022 From: duke at openjdk.org (ExE Boss) Date: Wed, 23 Nov 2022 04:30:16 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <g-fDDP-80KT0q-9uRqfx8HW86XCkw1KAnQdAtXfgnVs=.00476ce0-106e-453a-adf1-896d9d0aef85@github.com> On Tue, 22 Nov 2022 21:52:59 GMT, Smita Kamath <svkamath at openjdk.org> wrote: > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" src/hotspot/share/runtime/sharedRuntime.cpp line 531: > 529: return bits.f; > 530: } > 531: } Wrong?indentation: Suggestion: } else if (hf_exp == 16) { if (hf_significand_bits == 0) { bits.i = 0x7f800000; return sign * bits.f; } else { bits.i = (hf_sign_bit << 16) | 0x7f800000 | (hf_significand_bits << significand_shift); return bits.f; } } ------------- PR: https://git.openjdk.org/jdk/pull/11301 From dholmes at openjdk.org Wed Nov 23 04:49:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Nov 2022 04:49:24 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> Message-ID: <ZGtggo6m1dr3BWr4BqX6LmKm-G3TvlfkTonNHcvvNms=.5a4c0228-e5f0-4fc6-ace0-07374cd8b808@github.com> On Wed, 16 Nov 2022 16:17:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Cleanups > - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes > - Various include order fixes Seems okay and generally uncontroversial :) I'm not sure why conditional includes (that don't rely on macros.hpp) need to come at the end rather than in normal sort order? I don't care either way but a rationale for this would be good if it is to be the preferred style. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11108 From kbarrett at openjdk.org Wed Nov 23 05:02:09 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 23 Nov 2022 05:02:09 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <nVCoCE9doWoQ54qhPSbn1xM79YLzhggfELps4CpI53o=.3af1e9bf-5389-439e-bd78-1b4fce336c2f@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> <XXVqN4ByCrB34JRZSgiNYWsdrwEOTKjo5u81sTFG5bE=.7748c17a-4a13-42aa-b10d-219fe6775da2@github.com> <nVCoCE9doWoQ54qhPSbn1xM79YLzhggfELps4CpI53o=.3af1e9bf-5389-439e-bd78-1b4fce336c2f@github.com> Message-ID: <4NLoUD7YZRPKuVc08q_QnsbuPpb0wZIdmAuMN2tqM_c=.a34c13d0-0a6b-495e-b48e-e88debf6b4da@github.com> On Mon, 21 Nov 2022 02:43:12 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Out of curiosity, is there a way to get the discussion on approving the use of alignas back up? [...] A PR to address JDK-8252584 would be welcomed by me. Just do the process for Style Guide changes (see the Style Guide or previous PRs for such). I don't expect it would be very controversial. I think the only reason it hasn't already happened is because nobody has gotten around to it, or felt the need for it. JDK-8250269 touches a bit more code (mostly in stubGenerator_x86_64 and macroAssembler_x86_32), but also seems like it should be straightforward. > > The various MSVC-conditional direct uses of __declspec(align(N)) should probably currently be using ATTRIBUTE_ALIGNED. > > The instances of `__declspec(align())` changed here are in the native libraries written in C, not within HotSpot itself. From what I can see at least HotSpot never uses compiler alignment attributes directly and always strictly sticks to `ATTRIBUTE_ALIGNED` (which is probably a good thing) You are right that the Windows-conditionalized uses are in non-HotSpot code. I missed that context when skimming through the changes. Since Visual Studio is always C++ (even though the shared files are written as C), using alignas with appropriate conditionalization in those files should be fine. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Wed Nov 23 05:09:23 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Nov 2022 05:09:23 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <zzT0IfsPETkFu5NQWDkR9DUyl8fs_8vdwKFcfAium2k=.e7eba1e5-6aeb-461b-a24e-f02b885b739a@github.com> On Tue, 22 Nov 2022 21:52:59 GMT, Smita Kamath <svkamath at openjdk.org> wrote: > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" src/hotspot/share/runtime/sharedRuntime.cpp line 455: > 453: union {jfloat f; juint i;} bits; > 454: bits.f = x; > 455: jint doppel = bits.i; Doesn't the conversion from unsigned to signed risk a compiler warning being emitted? Can't you just use the existing `JavaValue` type to perform the union conversion trick? ------------- PR: https://git.openjdk.org/jdk/pull/11301 From kbarrett at openjdk.org Wed Nov 23 05:24:24 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 23 Nov 2022 05:24:24 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> Message-ID: <klNxsMmpOjXNtFhaX2Kqt-YqYSBCyCAoJkqnmFr_IeY=.b55c0f1a-4331-4d65-af05-a20b59f73f57@github.com> On Mon, 14 Nov 2022 12:20:54 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Sorry my eyes must be playing tricks on me. ?? >> >> Why did you need to add this here? > > It's to avoid redefining the linkage as static in os_windows.cpp (where it's implemented) after an extern declaration (inside the class), which is forbidden by C++11: > >> The linkages implied by successive declarations for a given entity shall agree. That is, within a given scope, each declaration declaring the same variable name or the same overloading of a function name shall imply the same linkage. > > While 2019 by default seems to ignore this rule and accepts the conflicting linkage as a language extension, this can cause issues with newer and stricter versions of the Visual C++ compiler (especially with -permissive- passed during compilation, which Magnus and Daniel have pointed out in another discussion will become the default mode of compilation in the future). It's not possible to declare a static friend inside a class, so the addition above takes advantage of another C++ feature instead: > >> ?11.3/4 [class.friend] > A function first declared in a friend declaration has external linkage (3.5). Otherwise, the function retains its previous linkage (7.1.1). I think the problem here is the friend declaration, which doesn't look like it's needed and could be deleted. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From njian at openjdk.org Wed Nov 23 06:05:21 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Wed, 23 Nov 2022 06:05:21 GMT Subject: RFR: 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware In-Reply-To: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> References: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> Message-ID: <5oorVhUmqur26ixXgu3fri_F6NQZorvuJLiqxBBDTUM=.7947f672-5246-47ce-9b72-63116db1d121@github.com> On Tue, 1 Nov 2022 06:08:26 GMT, Hao Sun <haosun at openjdk.org> wrote: > SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. > > 1) tier1~3 passed without new failures. > > 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. > > > Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units > MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms > MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms > MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms > MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms > MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms > MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms > MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms > MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms LGTM ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.org/jdk/pull/10925 From haosun at openjdk.org Wed Nov 23 06:31:25 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 23 Nov 2022 06:31:25 GMT Subject: RFR: 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware In-Reply-To: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> References: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> Message-ID: <GuaXXo_rcDBkBb2jgUZiycz6Uta9g__xfBT0J3qGRyQ=.1118e305-4b67-4cca-94b9-c4938da034b8@github.com> On Tue, 1 Nov 2022 06:08:26 GMT, Hao Sun <haosun at openjdk.org> wrote: > SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. > > 1) tier1~3 passed without new failures. > > 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. > > > Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units > MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms > MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms > MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms > MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms > MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms > MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms > MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms > MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms Could you help to review this patch when you got a chance? Maybe @nick-arm or @theRealAph Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10925 From kbarrett at openjdk.org Wed Nov 23 07:14:32 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 23 Nov 2022 07:14:32 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <ZGtggo6m1dr3BWr4BqX6LmKm-G3TvlfkTonNHcvvNms=.5a4c0228-e5f0-4fc6-ace0-07374cd8b808@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> <ZGtggo6m1dr3BWr4BqX6LmKm-G3TvlfkTonNHcvvNms=.5a4c0228-e5f0-4fc6-ace0-07374cd8b808@github.com> Message-ID: <Iz3sVAPFBaDf5dAbv7fpZzf6zDLC4CfJ378w6vuieWM=.04c5e321-45b9-42af-8707-4c00fa6a49d9@github.com> On Wed, 23 Nov 2022 04:47:11 GMT, David Holmes <dholmes at openjdk.org> wrote: > I'm not sure why conditional includes (that don't rely on macros.hpp) need to come at the end rather than in normal sort order? I don't care either way but a rationale for this would be good if it is to be the preferred style. Because the Style Guide says: * Put conditional inclusions (`#if ...`) at the end of the include list. I think most of our conditional includes these days are to support conditional features. It makes sense to group all the additional includes related to a feature. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From kbarrett at openjdk.org Wed Nov 23 07:21:27 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 23 Nov 2022 07:21:27 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> Message-ID: <AGubz9zAOrA4xdLxUemwyJ-Xf-QO2e7ycdTu03ZOers=.3897e5b5-1457-485c-a19c-b74211649ef3@github.com> On Wed, 16 Nov 2022 16:17:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Cleanups > - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes > - Various include order fixes Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11108 From dholmes at openjdk.org Wed Nov 23 07:31:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Nov 2022 07:31:24 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <Iz3sVAPFBaDf5dAbv7fpZzf6zDLC4CfJ378w6vuieWM=.04c5e321-45b9-42af-8707-4c00fa6a49d9@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> <ZGtggo6m1dr3BWr4BqX6LmKm-G3TvlfkTonNHcvvNms=.5a4c0228-e5f0-4fc6-ace0-07374cd8b808@github.com> <Iz3sVAPFBaDf5dAbv7fpZzf6zDLC4CfJ378w6vuieWM=.04c5e321-45b9-42af-8707-4c00fa6a49d9@github.com> Message-ID: <U4uymnR5jYSg1zZFnsb1NVS0tY2yEvWU6--zqyVNmeI=.fc7a9e67-5aaf-494a-87ad-1482f868265c@github.com> On Wed, 23 Nov 2022 07:12:13 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > > I'm not sure why conditional includes (that don't rely on macros.hpp) need to come at the end rather than in normal sort order? I don't care either way but a rationale for this would be good if it is to be the preferred style. > > Because the Style Guide says: > > * Put conditional inclusions (`#if ...`) at the end of the include list. Ah I see. Thanks for that @kimbarrett > > I think most of our conditional includes these days are to support conditional features. It makes sense to group all the additional includes related to a feature. True. There are a lot of single includes but it makes sense to have one simple rule. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From stuefe at openjdk.org Wed Nov 23 07:49:28 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 23 Nov 2022 07:49:28 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> Message-ID: <qqYzwYoDhBX_LYrsD_5UBu5JY1jVuwKutBfNp_3u06U=.a08b7766-cf81-4c96-8d4b-6949f220dd3a@github.com> On Wed, 16 Nov 2022 16:17:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Cleanups > - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes > - Various include order fixes Looks good and is an improvement. About the removed comments after includes, I am sometimes guilty of this to mark includes that came just for one specific little thing - the implied hope is that disentangling includes and removing unnecessary ones periodically is easier that way. However, this never worked. I wish we had an automatic include "GC" process to reduce the includes, like Ioi sometimes does manually. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/11108 From sgehwolf at openjdk.org Wed Nov 23 09:03:23 2022 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 23 Nov 2022 09:03:23 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <F5MuXpAPEb_Bqj1FrkNaf5wwkHZdc38xIjOLzUq5TX0=.15f21c71-3048-4422-ba1e-9a116f693f21@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> <F5MuXpAPEb_Bqj1FrkNaf5wwkHZdc38xIjOLzUq5TX0=.15f21c71-3048-4422-ba1e-9a116f693f21@github.com> Message-ID: <xe3RgT5kkmdDYUyeInP_Wirbbam8rg1zaFadE2TERrg=.82a5cf9d-a7c9-4c6c-b31c-2318f9680059@github.com> On Tue, 22 Nov 2022 15:52:46 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: >> Please review this addition to the jdk.ContainerConfigration event which adds information >> about the container host. Specifically, the total amount of memory of the host system. >> >> Testing: >> - [x] New test case (passed, fails before) >> - [x] JFR tests. Passed. >> >> Thoughts? > > Ping? Anyone willing to review this? > @jerboaa your original RFR email didn't go to hotspot-jfr-dev so this has only just appeared on the jfr mailing list. Ok, thanks David! ------------- PR: https://git.openjdk.org/jdk/pull/11143 From kevinw at openjdk.org Wed Nov 23 09:53:06 2022 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 23 Nov 2022 09:53:06 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v3] In-Reply-To: <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> Message-ID: <LflDdTR9zKboGb2anstVwteRLB7LaWoaLpkAKMM1oiU=.534466b9-2269-439c-bd85-77d9e8fbffb9@github.com> On Tue, 22 Nov 2022 06:39:40 GMT, David Holmes <dholmes at openjdk.org> wrote: >> This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. >> >> Testing: tiers 1-3 >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Manpage update > - Merge branch 'master' into 8297106-ensure-local-capacity > - Removed additional test that no longer applies. > - Forgot to commit deleted test file. > - Forgot to commit removed test. > - 8297106: Remove the -Xcheck:jni local reference capacity checking Marked as reviewed by kevinw (Committer). Yes good to get rid of these as they have caused confusion and concern over the years, and don't represent a real problem. ------------- PR: https://git.openjdk.org/jdk/pull/11259 From sspitsyn at openjdk.org Wed Nov 23 10:03:22 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Nov 2022 10:03:22 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 In-Reply-To: <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> Message-ID: <UAbgYFh2uFfbKiH0vpFRABzqC3O1_Vm1RYhNN99aJGY=.7ac0c84a-3ea4-45ee-a8bb-6c52ad515dff@github.com> On Wed, 23 Nov 2022 02:17:43 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > src/java.base/share/classes/java/lang/VirtualThread.java line 273: > >> 271: private void run(Runnable task) { >> 272: assert state == RUNNING; >> 273: boolean notifyJvmti = notifyJvmtiEvents; > > Don't we have same issue in yieldContinuation() method? (line 396) Good point. I'll check and make an update if needed. Thank you for looking at it. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From sspitsyn at openjdk.org Wed Nov 23 10:14:23 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Nov 2022 10:14:23 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> Message-ID: <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> > This problem has two sides. > One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. > It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` > value has been set to `true` when an agent library is loaded into running VM. > The fix is to get rid of this cashing. > Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. > Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. > The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. > > Testing: > The originally failed tests are passed now: > > runtime/vthread/RedefineClass.java > runtime/vthread/TestObjectAllocationSampleEvent.java > > In progress: > Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: remove caching if notifyJvmtiEvents in yieldContinuation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11304/files - new: https://git.openjdk.org/jdk/pull/11304/files/c0d2f0ef..6608b1a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11304&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11304&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11304.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11304/head:pull/11304 PR: https://git.openjdk.org/jdk/pull/11304 From sspitsyn at openjdk.org Wed Nov 23 10:18:59 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Nov 2022 10:18:59 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <UAbgYFh2uFfbKiH0vpFRABzqC3O1_Vm1RYhNN99aJGY=.7ac0c84a-3ea4-45ee-a8bb-6c52ad515dff@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> <UAbgYFh2uFfbKiH0vpFRABzqC3O1_Vm1RYhNN99aJGY=.7ac0c84a-3ea4-45ee-a8bb-6c52ad515dff@github.com> Message-ID: <sbRUVKEGSZ16-XIoJ25Jiqnh_6w8kF87IcTtIkYvkDY=.dcb6e8ff-e1f7-406b-8251-7f090fed85ce@github.com> On Wed, 23 Nov 2022 10:01:07 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 273: >> >>> 271: private void run(Runnable task) { >>> 272: assert state == RUNNING; >>> 273: boolean notifyJvmti = notifyJvmtiEvents; >> >> Don't we have same issue in yieldContinuation() method? (line 396) > > Good point. I'll check and make an update if needed. > Thank you for looking at it. Fixed the `yieldContinuation()` method. There is also `switchToCarrierThread()` method that returns the `notifyJvmtiEvents` value. It seems to be an optimization. I'm not sure yet, if we need to fix these places as well. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From ngasson at openjdk.org Wed Nov 23 10:23:23 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Wed, 23 Nov 2022 10:23:23 GMT Subject: RFR: 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware In-Reply-To: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> References: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> Message-ID: <EPTwebrSwzHM3VQGXMV-iixnsGBKCBwV-sK7onDKa9A=.c905578d-bc88-4b51-ab00-8580c1cca48f@github.com> On Tue, 1 Nov 2022 06:08:26 GMT, Hao Sun <haosun at openjdk.org> wrote: > SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. > > 1) tier1~3 passed without new failures. > > 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. > > > Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units > MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms > MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms > MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms > MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms > MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms > MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms > MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms > MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10925 From alanb at openjdk.org Wed Nov 23 10:33:22 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Nov 2022 10:33:22 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <sbRUVKEGSZ16-XIoJ25Jiqnh_6w8kF87IcTtIkYvkDY=.dcb6e8ff-e1f7-406b-8251-7f090fed85ce@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> <UAbgYFh2uFfbKiH0vpFRABzqC3O1_Vm1RYhNN99aJGY=.7ac0c84a-3ea4-45ee-a8bb-6c52ad515dff@github.com> <sbRUVKEGSZ16-XIoJ25Jiqnh_6w8kF87IcTtIkYvkDY=.dcb6e8ff-e1f7-406b-8251-7f090fed85ce@github.com> Message-ID: <veYG5Irc2bLioxns20-hR1n-yix-dDLvT0cR5f-8xz8=.6e76f4db-bced-46ec-bd7f-2d863ebc1dbb@github.com> On Wed, 23 Nov 2022 10:16:44 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: > There is also `switchToCarrierThread()` method that returns the `notifyJvmtiEvents` value. > It seems to be an optimization. I'm not sure yet, if we need to fix these places as well. It was to ensure that hide(true) and hide(false) are balanced. If it were to re-poll notifyJvmtiEvents and a JVMTI agent enables the capability while a thread in doing a temporary transition then you may get a hide(false) without the corresponding hide(true). ------------- PR: https://git.openjdk.org/jdk/pull/11304 From mcimadamore at openjdk.org Wed Nov 23 10:54:53 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 23 Nov 2022 10:54:53 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v30] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <kPdkyjB-7Po8sPr7weSbsWIIHf2TxnnQa9EBLfV3Wrc=.6b4612f8-03a7-47c4-8e25-36e4fe803e8f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix bit vs. byte mismatch in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/66dd888d..3c75e097 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=28-29 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From aph at openjdk.org Wed Nov 23 10:58:17 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 10:58:17 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> Message-ID: <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> On Thu, 17 Nov 2022 16:55:53 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: > > > Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. > > > > > > It might be a bit more than that: it's a test-and-branch on every memory access. Maybe enable it only on a non-optimized build? > > I am fine with enabling it for debug VM. But can you give at least some numbers? Sorry I'm being slow on this. I'm trying to get scoped values done before the fork. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From jwaters at openjdk.org Wed Nov 23 12:00:49 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 23 Nov 2022 12:00:49 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas Message-ID: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point ------------- Commit messages: - Include the correct pandoc emitted html files - Restore html files - HotSpot Style Guide Changes: https://git.openjdk.org/jdk/pull/11315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8252584 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11315.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11315/head:pull/11315 PR: https://git.openjdk.org/jdk/pull/11315 From mbaesken at openjdk.org Wed Nov 23 12:01:23 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 23 Nov 2022 12:01:23 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values In-Reply-To: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> Message-ID: <YkHA66hHe6fCbudZQUlEwVu3yvEhM7GNn4F4zqdeZl4=.af89d815-3636-4853-ac32-1a95488a4784@github.com> On Tue, 22 Nov 2022 18:10:42 GMT, Martin Doerr <mdoerr at openjdk.org> wrote: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). > Note: Implicit conversion from `intptr_t` to `RegisterOrConstant` is no longer supported. That's why I had to replace some `add` instructions. Looks good to me and fixes the issue I reported (https://bugs.openjdk.org/browse/JDK-8297426). You might want to check the SAP copyright header lines in some of the files. ------------- Marked as reviewed by mbaesken (Reviewer). PR: https://git.openjdk.org/jdk/pull/11297 From thartmann at openjdk.org Wed Nov 23 12:03:37 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Nov 2022 12:03:37 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM Message-ID: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. Thanks, Tobias ------------- Commit messages: - 8297389: resexhausted003 fails with assert(\!thread->owns_locks()) failed: must release all locks when leaving VM Changes: https://git.openjdk.org/jdk/pull/11316/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11316&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297389 Stats: 26 lines in 1 file changed: 7 ins; 4 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/11316.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11316/head:pull/11316 PR: https://git.openjdk.org/jdk/pull/11316 From mdoerr at openjdk.org Wed Nov 23 12:09:15 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Nov 2022 12:09:15 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values [v2] In-Reply-To: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> Message-ID: <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). > Note: Implicit conversion from `intptr_t` to `RegisterOrConstant` is no longer supported. That's why I had to replace some `add` instructions. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright years. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11297/files - new: https://git.openjdk.org/jdk/pull/11297/files/5af4a37b..fc905a96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11297&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11297&range=00-01 Stats: 10 lines in 9 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/11297.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11297/head:pull/11297 PR: https://git.openjdk.org/jdk/pull/11297 From ayang at openjdk.org Wed Nov 23 13:11:09 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 23 Nov 2022 13:11:09 GMT Subject: RFR: 8297499: Parallel: Missing klass when marking objArray object in Full GC Message-ID: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. Test: the revised test fails for Parallel without the fix ------------- Commit messages: - pgc-do-klass Changes: https://git.openjdk.org/jdk/pull/11321/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11321&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297499 Stats: 55 lines in 2 files changed: 43 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/11321.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11321/head:pull/11321 PR: https://git.openjdk.org/jdk/pull/11321 From sjohanss at openjdk.org Wed Nov 23 13:11:10 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 23 Nov 2022 13:11:10 GMT Subject: RFR: 8297499: Parallel: Missing klass when marking objArray object in Full GC In-Reply-To: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> References: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Message-ID: <iEhfVlOW5stmK5ZQP8sgRZiLNu-J0nLBf4qhb-QPSd4=.adc1b710-40df-4d27-a446-323bfb8e5f49@github.com> On Wed, 23 Nov 2022 12:55:55 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. > > Test: the revised test fails for Parallel without the fix Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/11321 From tschatzl at openjdk.org Wed Nov 23 13:11:10 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 23 Nov 2022 13:11:10 GMT Subject: RFR: 8297499: Parallel: Missing klass when marking objArray object in Full GC In-Reply-To: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> References: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Message-ID: <JdIHhFvuyceuy58rGKUNYjTaLFgeSW1qHpZayJGpIeU=.1cbb575a-dae6-4296-93cb-20aa4b8c9561@github.com> On Wed, 23 Nov 2022 12:55:55 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. > > Test: the revised test fails for Parallel without the fix Lgtm, nice catch! One nit: the name of the bug does not indicate what "missing klass" means, I would prefer spelling it out like: Parallel: Missing iteration over klass when marking objArrays/objArrayOops during Full GC or Parallel does not mark/iterate over/process klass for objArrays/objArrayOops during Full GC ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/11321 From aph at openjdk.org Wed Nov 23 14:16:35 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 14:16:35 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v15] In-Reply-To: <RPmfJsR3Kh1mGzc7Nd8ybgSnj_0eDiL2OeEECjF2puY=.0d5323fe-01c0-4c74-acb1-fb39cdbd69a9@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <ijL-Up8ZxxnWvQkYsbqxxt9_BvPfaszJ1FqzERpJtAE=.d91b0a69-af58-46e5-8eff-7d8f3f8b700c@github.com> <RPmfJsR3Kh1mGzc7Nd8ybgSnj_0eDiL2OeEECjF2puY=.0d5323fe-01c0-4c74-acb1-fb39cdbd69a9@github.com> Message-ID: <wuChkJBvx0fqNFaoO5EVJRLXOyADuFmJLPRb_rbZjts=.224c0dc0-755b-4c15-af7a-8693c87be06f@github.com> On Fri, 18 Nov 2022 17:31:30 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Reviewer feedback >> - Reviewer feedback Javadoc fixes > > src/java.base/share/classes/java/lang/Thread.java line 789: > >> 787: >> 788: // special value to mean a new thread >> 789: this.scopedValueBindings = NEW_THREAD_BINDINGS; > > Can we change the comment on this one to be the same as the other constructor? Done. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 14:22:28 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 14:22:28 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v6] In-Reply-To: <VQfLGLzs4-yMjO1eZMm-rku-Ti63OETdzNtRnaberMs=.8e95e70e-114d-4920-bb31-d3e8f72c502d@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <rFpWIyEO_DC9u3gyaxwlC3nK0gdo_2gGlX9bgNZvtZE=.2eec08c4-0a50-4f81-b4c9-45eba639f941@github.com> <nK7KFzzYywjKGp2TTJF1ALF4yQdkN0HC1ja2wtZTSRc=.ef43617b-9781-44cf-b3d1-7ad951ec598a@github.com> <XEqyhW1QAJKrizAC-gOblrN7Pa1F_f2C8I5XO9RJqBM=.bc08c201-9382-43a6-a68b-b809028524a8@github.com> <CRejKgLiOJHhFPCEfxGBnnNGZiFhES4Ac9Dqtgq2VWY=.12932aae-9015-4452-9a7c-4c59b1981495@github.com> <b96D_PGxmI4uKC9L9GBappJWwfZS_ken4e7ho3HCoA4=.5b5db3c9-316d-46a9-9e7f-3f85ff46d73b@github.com> <8Gol6phltQIqgGpXbVDn_iUDtqyRm8NKy_U63w2oQ8g=.ba7a834d-bb03-4a20-ae69-4371068f1439@github.com> <xviMqszPHdP7wFjNGMk5u5o81rC-KK8GWRQQwAA642E=.306213e0-8b8b-4f96-9ee3-f7eac92015b4@github.com> <A9xOoYTgicpAXaATLn1jLW1X0pP7ZlDRh0ISp9E1H3M=.7ee59a3e-5e66-43a0-89b2-fcfa4674b449@github.com> <okStohonRqnvGd9Sx3M4vIfuwrDfZW-9bHUQn-BrtKk=.8c8f166d-025a-43b9-88d7-6e7882ac1394@github.com> <VQfLGLzs4-yMjO1eZMm-rku-Ti63OETdzNt RnaberMs=.8e95e70e-114d-4920-bb31-d3e8f72c502d@github.com> Message-ID: <-BdUpu_Zl8KQtodaCeDyz4zhcQhuyI7Z1ujcqg7-c8Q=.25c08e24-cc8f-4dda-b78a-dd85c6d32d66@github.com> On Wed, 16 Nov 2022 13:13:59 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Note?that `ScopedValue` can?currently be?bound to?`null`, but?by?using `Optional`, there?would be?no?way to?differentiate an?unbound `ScopedValue` from?one bound?to?`null`. > >> Note?that `ScopedValue` can?currently be?bound to?`null`, but?by?using `Optional`, there?would be?no?way to?differentiate an?unbound `ScopedValue` from?one bound?to?`null`. > > That's right, an Optional view would have to deal with that. OK, so unless anyone has any /strong/ objections I'll mark this as Resolved. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 14:33:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 14:33:48 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v22] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <pFWfLHMs3pXWJ91-x19vfCvJ9q6-q6vVAZF_UmP5cBQ=.e98ae2b0-6fc1-47a1-ae0b-9717ba6abce5@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/classfile/vmSymbols.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/dc577736..b28ca4d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 14:33:51 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 14:33:51 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v21] In-Reply-To: <P3uXRY2QdSGl5AGCu_UxBxV_sipWpZXxr_k4uaTRSlE=.390bd458-4e8f-40dd-9f62-5b64cedff5f3@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <P3uXRY2QdSGl5AGCu_UxBxV_sipWpZXxr_k4uaTRSlE=.390bd458-4e8f-40dd-9f62-5b64cedff5f3@github.com> Message-ID: <A04_IP8SiNIebKiQBkDS1rJXePANW8mxTsC4H08wCFo=.9c1bb4da-3612-4952-86c6-bee170bd8412@github.com> On Tue, 22 Nov 2022 17:34:22 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 > - Update src/java.base/share/classes/java/lang/VirtualThread.java > > Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> > - Feedback from reviewers src/hotspot/share/classfile/vmSymbols.hpp line 612: > 610: template(thread_throwable_void_signature, "(Ljava/lang/Thread;Ljava/lang/Throwable;)V") \ > 611: template(thread_void_signature, "(Ljava/lang/Thread;)V") \ > 612: template(runnable_void_signature, "(Ljava/lang/Runnable;)V") \ Suggestion: template(runnable_void_signature, "(Ljava/lang/Runnable;)V") \ ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 14:38:52 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 14:38:52 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v23] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <cJ99HZOt75ImLfr4brP_sP2t1Z4vN3Xd13j56RIfbyg=.e29f0d0a-2618-401f-8e84-d6e9bdd7b55b@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Feedback from reviewers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/b28ca4d2..7ac61ba2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=21-22 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From sjohanss at openjdk.org Wed Nov 23 16:06:20 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 23 Nov 2022 16:06:20 GMT Subject: RFR: 8297427: Avoid keeping class loaders alive when executing ClassLoaderStatsVMOperation Message-ID: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> Please review this change to avoid keeping classes alive only due to the `ClassLoaderStatsVMOperation`. **Summary** The `ClassLoaderStatsVMOperation` is gathering statistics about the active class loaders in a safepoint. The way the `ClassLoaderDataGraph` is iterated will keep the class loaders live. This is not really needed since everything is done in a safepoint and nothing needs to be explicitly kept alive. This has not been a problem prior to concurrent class unloading in ZGC. With fully concurrent class unloading a `ClassLoaderStatsVMOperation` can occur during a collection and more classes than needed might be kept alive. This could in turn lead to premature Metaspace OOM. The solution is to not keep the class loaders alive due to the iteration in `ClassLoaderStatsVMOperation`. **Testing** * Added a new test that covers the two different ways a class could previously be kept alive by the VM operation. The test passes after the fix but failed before. * Mach5 tier 1-3 ------------- Commit messages: - Missing include for minimal - Axel comments to use templates - Fix test indent - Move ChildClassLoader into test - Add parent_no_keepalive to allow class unloading in parents - Extend test to cover keep alive through access to parent - Avoid keeping CLD alive for class loader stats vm op - Test to provoke failed class unloading Changes: https://git.openjdk.org/jdk/pull/11300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11300&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297427 Stats: 346 lines in 19 files changed: 282 ins; 51 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/11300.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11300/head:pull/11300 PR: https://git.openjdk.org/jdk/pull/11300 From mcimadamore at openjdk.org Wed Nov 23 17:33:06 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 23 Nov 2022 17:33:06 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v31] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: * remove unused Scoped interface * re-add trusting of final fields in layout class implementations * Fix BulkOps benchmark, which had alignment issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/3c75e097..97168155 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=29-30 Stats: 56 lines in 5 files changed: 8 ins; 39 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From egahlin at openjdk.org Wed Nov 23 17:49:26 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 23 Nov 2022 17:49:26 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> Message-ID: <MVxi2koZp0-NAln4baJ-UkXVKeXYDvhVX_-4cufrmY0=.4b65300a-2531-433e-9984-889721dd2830@github.com> On Mon, 14 Nov 2022 20:19:29 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: > Please review this addition to the jdk.ContainerConfigration event which adds information > about the container host. Specifically, the total amount of memory of the host system. > > Testing: > - [x] New test case (passed, fails before) > - [x] JFR tests. Passed. > > Thoughts? Marked as reviewed by egahlin (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11143 From aph at openjdk.org Wed Nov 23 18:18:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 18:18:56 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v24] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <Dw2m9wLr8utF7WHlA4rfhjziWk0Xx9LvlbowBZwzoHQ=.c824ec4d-c571-4e52-b919-34871bddf9df@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/7ac61ba2..8c526003 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 18:25:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 18:25:41 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v11] In-Reply-To: <h3B01LlqVD3uiXhhGeln_rF6krBDWyUbw0oOCh69ZYU=.9d1c5d6e-4ef5-464e-a232-20c95d81b5d3@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <nGXhxc7fXpK6UayTXFSfaE406z1kfyQiS4SOVUBo2oU=.07531d82-960f-4fdb-b1b4-cb0bfdba683d@github.com> <h3B01LlqVD3uiXhhGeln_rF6krBDWyUbw0oOCh69ZYU=.9d1c5d6e-4ef5-464e-a232-20c95d81b5d3@github.com> Message-ID: <gRktJUWbj9kyJOEVnFO_GTs5nX_xgtXIrD0StBxHNi8=.5c976e12-58d6-4227-9d0c-8e403c44ca63@github.com> On Fri, 18 Nov 2022 15:19:35 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Reviewer feedback > > src/java.base/share/classes/java/lang/Thread.java line 787: > >> 785: >> 786: // special value to mean a new thread >> 787: this.scopedValueBindings = Thread.class; > > The addition of NEW_THREAD_BINDINGS means this one should change too. The update means the comment should probably be adjusted too, maybe "initial value for a new thread". Done ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Wed Nov 23 18:39:19 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 23 Nov 2022 18:39:19 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: - Merge master - javadoc - Feedback from reviewers - Update src/hotspot/share/classfile/vmSymbols.hpp - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Update src/java.base/share/classes/java/lang/VirtualThread.java Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> - Update src/java.base/share/classes/java/lang/Thread.java Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> - Update src/hotspot/cpu/aarch64/aarch64.ad Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> - Feedback from reviewers - Remove incorrect assertion. - ... and 42 more: https://git.openjdk.org/jdk/compare/2afb4c33...30f150e1 ------------- Changes: https://git.openjdk.org/jdk/pull/10952/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=24 Stats: 3332 lines in 61 files changed: 2907 ins; 254 del; 171 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From psandoz at openjdk.org Wed Nov 23 19:18:39 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 23 Nov 2022 19:18:39 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> Message-ID: <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> On Wed, 23 Nov 2022 18:39:19 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge master > - javadoc > - Feedback from reviewers > - Update src/hotspot/share/classfile/vmSymbols.hpp > - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 > - Update src/java.base/share/classes/java/lang/VirtualThread.java > > Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> > - Update src/java.base/share/classes/java/lang/Thread.java > > Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> > - Update src/hotspot/cpu/aarch64/aarch64.ad > > Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> > - Feedback from reviewers > - Remove incorrect assertion. > - ... and 42 more: https://git.openjdk.org/jdk/compare/2afb4c33...30f150e1 Looks good (just some last minor comments). I did not focus on the tests, nor too closely at all of the HotSpot changes. Something for future investigation perhaps (if not already thought about): consider using a persistent map, a Hash Array Mapped Trie (HAMT), for storing scoped keys and values, which could potentially remove the need for the cache of replace the cache when many values are in scope. The HAMT's structural sharing properties, wide-branching factor, and `Integer.bitCount` being intrinsic all make for an efficient implementation. src/hotspot/share/classfile/vmSymbols.hpp line 401: > 399: template(daemon_name, "daemon") \ > 400: template(run_method_name, "run") \ > 401: template(call_method_name, "call") \ Is this used? src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 209: > 207: final int bitmask; > 208: > 209: private static final Object NIL = new Object(); Suggestion: static final Object NO_VALUE = new Object(); src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 212: > 210: > 211: static final Snapshot EMPTY_SNAPSHOT = new Snapshot(); > 212: Snapshot(Carrier bindings, Snapshot prev) { Suggestion: static final Snapshot EMPTY_SNAPSHOT = new Snapshot(); Snapshot(Carrier bindings, Snapshot prev) { src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 218: > 216: } > 217: > 218: protected Snapshot() { Suggestion: Snapshot() { src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 464: > 462: * Calls a value-returning operation with a {@code ScopedValue} bound to a value > 463: * in the current thread. When the operation completes (normally or with an > 464: * exception), the {@code ScopedValue} will revert to being unbound, or rervert to Suggestion: * exception), the {@code ScopedValue} will revert to being unbound, or revert to ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.org/jdk/pull/10952 From sspitsyn at openjdk.org Wed Nov 23 19:40:27 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 23 Nov 2022 19:40:27 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <veYG5Irc2bLioxns20-hR1n-yix-dDLvT0cR5f-8xz8=.6e76f4db-bced-46ec-bd7f-2d863ebc1dbb@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <kOqnmcmgsWx3ci-8x10-_4SXgVZ03vQ17LiQ3BMOEvM=.e4c9c6be-85b2-461b-ab77-82e3f1e2869b@github.com> <UAbgYFh2uFfbKiH0vpFRABzqC3O1_Vm1RYhNN99aJGY=.7ac0c84a-3ea4-45ee-a8bb-6c52ad515dff@github.com> <sbRUVKEGSZ16-XIoJ25Jiqnh_6w8kF87IcTtIkYvkDY=.dcb6e8ff-e1f7-406b-8251-7f090fed85ce@github.com> <veYG5Irc2bLioxns20-hR1n-yix-dDLvT0cR5f-8xz8=.6e76f4db-bced-46ec-bd7f-2d863ebc1dbb@github.com> Message-ID: <iXz95IMUD9ryM0emzUwOzcO5kLf6MhCOc_umnzid2Jk=.5aca0ef4-9ebc-4a50-b3d9-ff9df1a62f42@github.com> On Wed, 23 Nov 2022 10:31:00 GMT, Alan Bateman <alanb at openjdk.org> wrote: >> Fixed the `yieldContinuation()` method. >> There is also `switchToCarrierThread()` method that returns the `notifyJvmtiEvents` value. >> It seems to be an optimization. I'm not sure yet, if we need to fix these places as well. > >> There is also `switchToCarrierThread()` method that returns the `notifyJvmtiEvents` value. >> It seems to be an optimization. I'm not sure yet, if we need to fix these places as well. > > It was to ensure that hide(true) and hide(false) are balanced. If it were to re-poll notifyJvmtiEvents and a JVMTI agent enables the capability while a thread in doing a temporary transition then you may get a hide(false) without the corresponding hide(true). Okay, I see it now. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From svkamath at openjdk.org Wed Nov 23 19:58:41 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 23 Nov 2022 19:58:41 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11301/files - new: https://git.openjdk.org/jdk/pull/11301/files/bda63544..5af25e9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11301&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11301&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/11301.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11301/head:pull/11301 PR: https://git.openjdk.org/jdk/pull/11301 From sgehwolf at openjdk.org Wed Nov 23 20:01:25 2022 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 23 Nov 2022 20:01:25 GMT Subject: RFR: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <MVxi2koZp0-NAln4baJ-UkXVKeXYDvhVX_-4cufrmY0=.4b65300a-2531-433e-9984-889721dd2830@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> <MVxi2koZp0-NAln4baJ-UkXVKeXYDvhVX_-4cufrmY0=.4b65300a-2531-433e-9984-889721dd2830@github.com> Message-ID: <hSj2gbdoSC9lPMfJBkyHaOPgJkRiQx570gak0mHNlVM=.c6171be5-916b-4648-aa68-e04215598168@github.com> On Wed, 23 Nov 2022 17:45:24 GMT, Erik Gahlin <egahlin at openjdk.org> wrote: >> Please review this addition to the jdk.ContainerConfigration event which adds information >> about the container host. Specifically, the total amount of memory of the host system. >> >> Testing: >> - [x] New test case (passed, fails before) >> - [x] JFR tests. Passed. >> >> Thoughts? > > Marked as reviewed by egahlin (Reviewer). Thanks for the review @egahlin! ------------- PR: https://git.openjdk.org/jdk/pull/11143 From rrich at openjdk.org Wed Nov 23 20:24:59 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 23 Nov 2022 20:24:59 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values [v2] In-Reply-To: <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> Message-ID: <b9i0aCatmRn-RBtkQ3lRWgZGfKcwc0RXcrYKcRLxg80=.066a0fd3-a560-46ab-bd6b-e9e97b8240de@github.com> On Wed, 23 Nov 2022 12:09:15 GMT, Martin Doerr <mdoerr at openjdk.org> wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). >> Note: Implicit conversion from `intptr_t` to `RegisterOrConstant` is no longer supported. That's why I had to replace some `add` instructions. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Clean as a whistle :) Thanks, Richard. src/hotspot/cpu/ppc/register_ppc.hpp line 89: > 87: bool operator==(const Register rhs) const { return _encoding == rhs._encoding; } > 88: bool operator!=(const Register rhs) const { return _encoding != rhs._encoding; } > 89: const Register* operator->() const { return this; } This is clever! Personally I'd be in favor of removing it in a cleanup change for simplicity. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/11297 From rrich at openjdk.org Wed Nov 23 20:33:33 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 23 Nov 2022 20:33:33 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values [v2] In-Reply-To: <b9i0aCatmRn-RBtkQ3lRWgZGfKcwc0RXcrYKcRLxg80=.066a0fd3-a560-46ab-bd6b-e9e97b8240de@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> <b9i0aCatmRn-RBtkQ3lRWgZGfKcwc0RXcrYKcRLxg80=.066a0fd3-a560-46ab-bd6b-e9e97b8240de@github.com> Message-ID: <SwLkuxJBEiEJKGpyfI_UjMvGpzn6-gngveGLxaoHcj8=.6cda4ea9-94f0-4c6b-b552-10c7a31ed7f1@github.com> On Wed, 23 Nov 2022 15:41:32 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Update Copyright years. > > src/hotspot/cpu/ppc/register_ppc.hpp line 89: > >> 87: bool operator==(const Register rhs) const { return _encoding == rhs._encoding; } >> 88: bool operator!=(const Register rhs) const { return _encoding != rhs._encoding; } >> 89: const Register* operator->() const { return this; } > > This is clever! Personally I'd be in favor of removing it in a cleanup change for simplicity. Ah, it's also used in shared code (e.g. FrameMap::regname()) so cleanup is only possible after all ports represent Registers as values. ------------- PR: https://git.openjdk.org/jdk/pull/11297 From mdoerr at openjdk.org Wed Nov 23 21:04:21 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Nov 2022 21:04:21 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values [v2] In-Reply-To: <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> Message-ID: <K8CbzGBr1YOj79zDBaC05aLIvhYMgRK6JYQSzPCVUwM=.1415a266-534d-4ed1-bd5d-d60cfe9dab47@github.com> On Wed, 23 Nov 2022 12:09:15 GMT, Martin Doerr <mdoerr at openjdk.org> wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). >> Note: Implicit conversion from `intptr_t` to `RegisterOrConstant` is no longer supported. That's why I had to replace some `add` instructions. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11297 From mdoerr at openjdk.org Wed Nov 23 21:04:21 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Nov 2022 21:04:21 GMT Subject: RFR: 8297445: PPC64: Represent Registers as values [v2] In-Reply-To: <SwLkuxJBEiEJKGpyfI_UjMvGpzn6-gngveGLxaoHcj8=.6cda4ea9-94f0-4c6b-b552-10c7a31ed7f1@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> <Rf-Uau-SX_8rCozdisKhlKqpg99nQ0txNbUZtQJShLM=.a04cbfa3-0ef5-4c96-b3ea-b7f99f899941@github.com> <b9i0aCatmRn-RBtkQ3lRWgZGfKcwc0RXcrYKcRLxg80=.066a0fd3-a560-46ab-bd6b-e9e97b8240de@github.com> <SwLkuxJBEiEJKGpyfI_UjMvGpzn6-gngveGLxaoHcj8=.6cda4ea9-94f0-4c6b-b552-10c7a31ed7f1@github.com> Message-ID: <RWwv19I7P0hnIL_n-AlXz4fMgeazbVrVF7SV9IFnVtE=.9f3f11f8-da4f-4af6-81d7-6ba069a8ea15@github.com> On Wed, 23 Nov 2022 20:29:39 GMT, Richard Reingruber <rrich at openjdk.org> wrote: >> src/hotspot/cpu/ppc/register_ppc.hpp line 89: >> >>> 87: bool operator==(const Register rhs) const { return _encoding == rhs._encoding; } >>> 88: bool operator!=(const Register rhs) const { return _encoding != rhs._encoding; } >>> 89: const Register* operator->() const { return this; } >> >> This is clever! Personally I'd be in favor of removing it in a cleanup change for simplicity. > > Ah, it's also used in shared code (e.g. FrameMap::regname()) so cleanup is only possible after all ports represent Registers as values. Correct. I think this should get done at some point of time. ------------- PR: https://git.openjdk.org/jdk/pull/11297 From dholmes at openjdk.org Wed Nov 23 21:46:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Nov 2022 21:46:24 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM In-Reply-To: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> Message-ID: <1dQAA1ZA1dgxspW6qeaMHKsvYAmO1drbNMQShWlaiEU=.fc7678af-96d4-410c-800b-6af405642422@github.com> On Wed, 23 Nov 2022 11:56:45 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. > > Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: > https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 > > Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. > > Thanks, > Tobias If `MethodData::allocate` cannot be called whilst holding a lock then perhaps we can assert that in there? I think there is a broader problem here that metaspace allocation can occur in numerous places and any failure could post the resource-exhausted event and potentially lead to problems like this. This fix side-steps one problematic call-site, but going lock-free has its own concerns. src/hotspot/share/oops/method.cpp line 594: > 592: > 593: ClassLoaderData* loader_data = method->method_holder()->class_loader_data(); > 594: MethodData* method_data = MethodData::allocate(loader_data, method, THREAD); So the downside of lock-free here is that we have to pre-allocate and then later free. How expensive is that? How likely are we to get multiple threads attempting this at the same time? We might trigger a resource-exhausted event unnecessarily due to the temporary use of metaspace. src/hotspot/share/oops/method.cpp line 604: > 602: // total store order (TSO) the reference may become visible before > 603: // the initialization of data otherwise. > 604: OrderAccess::release(); A release here is not necessary as `Atomic::replace_if_null` will by default have `memory_order_conservative` which maintains the full bi-directional fence of the internal `cmpxchg`. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11316 From kbarrett at openjdk.org Wed Nov 23 21:47:09 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 23 Nov 2022 21:47:09 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <klNxsMmpOjXNtFhaX2Kqt-YqYSBCyCAoJkqnmFr_IeY=.b55c0f1a-4331-4d65-af05-a20b59f73f57@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> <klNxsMmpOjXNtFhaX2Kqt-YqYSBCyCAoJkqnmFr_IeY=.b55c0f1a-4331-4d65-af05-a20b59f73f57@github.com> Message-ID: <3mBOtLQz_ulylm0XJkhUBwzPEV7AoPmMo20facw9Xn4=.c431ac9c-4f1f-4db6-bb46-745987f06777@github.com> On Wed, 23 Nov 2022 05:22:10 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> It's to avoid redefining the linkage as static in os_windows.cpp (where it's implemented) after an extern declaration (inside the class), which is forbidden by C++11: >> >>> The linkages implied by successive declarations for a given entity shall agree. That is, within a given scope, each declaration declaring the same variable name or the same overloading of a function name shall imply the same linkage. >> >> While 2019 by default seems to ignore this rule and accepts the conflicting linkage as a language extension, this can cause issues with newer and stricter versions of the Visual C++ compiler (especially with -permissive- passed during compilation, which Magnus and Daniel have pointed out in another discussion will become the default mode of compilation in the future). It's not possible to declare a static friend inside a class, so the addition above takes advantage of another C++ feature instead: >> >>> ?11.3/4 [class.friend] >> A function first declared in a friend declaration has external linkage (3.5). Otherwise, the function retains its previous linkage (7.1.1). > > I think the problem here is the friend declaration, which doesn't look like it's needed and could be deleted. Digging into this some more, the friend declaration exists to provide access to the private `os::win32::enum Ept`. One obvious and cheap solution to that would be to make that enum public. I think that would be an improvement vs the current friend declaration. But there are some other things one could complain about there, such as the type of the function requiring a complicated function pointer cast where it's used. Here's a patch that I think cleans this up. diff --git a/src/hotspot/os/windows/os_windows.cpp b/src/hotspot/os/windows/os_windows.cpp index 0651f0868f3..bf9e759b1d6 100644 --- a/src/hotspot/os/windows/os_windows.cpp +++ b/src/hotspot/os/windows/os_windows.cpp @@ -511,7 +511,9 @@ JNIEXPORT LONG WINAPI topLevelExceptionFilter(struct _EXCEPTION_POINTERS* exceptionInfo); // Thread start routine for all newly created threads -static unsigned __stdcall thread_native_entry(Thread* thread) { +// Called with the associated Thread* as the argument. +unsigned __stdcall os::win32::thread_native_entry(void* t) { + Thread* thread = static_cast<Thread*>(t); thread->record_stack_base_and_size(); thread->initialize_thread_current(); @@ -744,7 +746,7 @@ bool os::create_thread(Thread* thread, ThreadType thr_type, thread_handle = (HANDLE)_beginthreadex(NULL, (unsigned)stack_size, - (unsigned (__stdcall *)(void*)) thread_native_entry, + &os::win32::thread_native_entry, thread, initflag, &thread_id); diff --git a/src/hotspot/os/windows/os_windows.hpp b/src/hotspot/os/windows/os_windows.hpp index 94d7c3c5e2d..197797078d7 100644 --- a/src/hotspot/os/windows/os_windows.hpp +++ b/src/hotspot/os/windows/os_windows.hpp @@ -36,7 +36,6 @@ typedef void (*signal_handler_t)(int); class os::win32 { friend class os; - friend unsigned __stdcall thread_native_entry(Thread*); protected: static int _processor_type; @@ -70,6 +69,10 @@ class os::win32 { static HINSTANCE load_Windows_dll(const char* name, char *ebuf, int ebuflen); private: + // The handler passed to _beginthreadex(). + // Called with the associated Thread* as the argument. + static unsigned __stdcall thread_native_entry(void*); + enum Ept { EPT_THREAD, EPT_PROCESS, EPT_PROCESS_DIE }; // Wrapper around _endthreadex(), exit() and _exit() static int exit_process_or_thread(Ept what, int exit_code); ------------- PR: https://git.openjdk.org/jdk/pull/11081 From sviswanathan at openjdk.org Wed Nov 23 22:36:22 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 23 Nov 2022 22:36:22 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> Message-ID: <y2gsN6b6WILHriVv6JaQePOs4NkHe8FUMx8hfhZR23o=.7ebf8c73-f9b1-4756-885b-0a899231586e@github.com> On Wed, 23 Nov 2022 19:58:41 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Marked as reviewed by sviswanathan (Reviewer). The PR looks good to me. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From svkamath at openjdk.org Wed Nov 23 22:36:23 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 23 Nov 2022 22:36:23 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <zzT0IfsPETkFu5NQWDkR9DUyl8fs_8vdwKFcfAium2k=.e7eba1e5-6aeb-461b-a24e-f02b885b739a@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <zzT0IfsPETkFu5NQWDkR9DUyl8fs_8vdwKFcfAium2k=.e7eba1e5-6aeb-461b-a24e-f02b885b739a@github.com> Message-ID: <ZmfjKZMeilN65p5h3GK-WXB2LTpbACqBZofC06g80kQ=.20df63cc-3c0e-4621-a0dd-435ceb13d549@github.com> On Wed, 23 Nov 2022 05:07:00 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/share/runtime/sharedRuntime.cpp line 455: > >> 453: union {jfloat f; juint i;} bits; >> 454: bits.f = x; >> 455: jint doppel = bits.i; > > Doesn't the conversion from unsigned to signed risk a compiler warning being emitted? > > Can't you just use the existing `JavaValue` type to perform the union conversion trick? Hi David, thanks for pointing this out. I have updated the code to use jint. I have used the union conversion trick that was previously used in SharedRuntime::drem and SharedRuntime::frem. I hope that's okay with you. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From dholmes at openjdk.org Thu Nov 24 00:00:22 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 00:00:22 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <ZmfjKZMeilN65p5h3GK-WXB2LTpbACqBZofC06g80kQ=.20df63cc-3c0e-4621-a0dd-435ceb13d549@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <zzT0IfsPETkFu5NQWDkR9DUyl8fs_8vdwKFcfAium2k=.e7eba1e5-6aeb-461b-a24e-f02b885b739a@github.com> <ZmfjKZMeilN65p5h3GK-WXB2LTpbACqBZofC06g80kQ=.20df63cc-3c0e-4621-a0dd-435ceb13d549@github.com> Message-ID: <R6CpsAWb2h9hugiDQWQld0V6BvAVqkhoA34Xo1ra3qQ=.b40d3bfe-5cd7-474d-90d6-4cb97400b0ef@github.com> On Wed, 23 Nov 2022 22:32:35 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 455: >> >>> 453: union {jfloat f; juint i;} bits; >>> 454: bits.f = x; >>> 455: jint doppel = bits.i; >> >> Doesn't the conversion from unsigned to signed risk a compiler warning being emitted? >> >> Can't you just use the existing `JavaValue` type to perform the union conversion trick? > > Hi David, thanks for pointing this out. I have updated the code to use jint. > I have used the union conversion trick that was previously used in SharedRuntime::drem and SharedRuntime::frem. I hope that's okay with you. We seem to employ this trick in a few places, e.g. also see metaprogramming/primitiveConversions.hpp. It would be good to reduce that so I will file a separate RFE. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From dholmes at openjdk.org Thu Nov 24 00:03:19 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 00:03:19 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> Message-ID: <cZepJGSEec9FH3_rd7MmfdacnpZaVHNOSnFWn54fHcE=.e71d6cc4-b59f-40ad-8c8f-af5f9fba79b6@github.com> On Wed, 23 Nov 2022 19:58:41 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments I'm not familiar with the details of this code, so will put this through our CI to verify the test failure is dealt with. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From dholmes at openjdk.org Thu Nov 24 00:30:47 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 00:30:47 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive Message-ID: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). Thanks. ------------- Commit messages: - 8286185: The Java manpage can be more platform inclusive Changes: https://git.openjdk.org/jdk/pull/11340/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11340&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286185 Stats: 57 lines in 1 file changed: 19 ins; 3 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/11340.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11340/head:pull/11340 PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Thu Nov 24 00:47:36 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 00:47:36 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v2] In-Reply-To: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> Message-ID: <t-sxAd4dlWirbnJHKeJDD-4MeOdmMkysY3dek9mlbX8=.e83b2fba-69c8-49db-aabb-73cbf87e981c@github.com> > This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. > > One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). > > This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fixed formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11340/files - new: https://git.openjdk.org/jdk/pull/11340/files/01a5216c..1e62b16c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11340&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11340&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11340.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11340/head:pull/11340 PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Thu Nov 24 00:50:22 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 00:50:22 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> Message-ID: <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> > This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. > > One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). > > This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11340/files - new: https://git.openjdk.org/jdk/pull/11340/files/1e62b16c..0585f2ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11340&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11340&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11340.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11340/head:pull/11340 PR: https://git.openjdk.org/jdk/pull/11340 From sspitsyn at openjdk.org Thu Nov 24 01:25:19 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 24 Nov 2022 01:25:19 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> Message-ID: <FH8jlP5UoSWSq-pQx_I6I06ps7jFJD0bRxInfQPaMMQ=.33f566fc-8e45-40e2-b1fd-745c1f88f59c@github.com> On Thu, 24 Nov 2022 00:50:22 GMT, David Holmes <dholmes at openjdk.org> wrote: >> This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. >> >> One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). >> >> This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix formatting Nice doc update. Looks good to me. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11340 From lmesnik at openjdk.org Thu Nov 24 01:39:18 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 24 Nov 2022 01:39:18 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> Message-ID: <5CWDDqV9i7S2TQeIhw0PtKjvWbnH2YqFUwiwZHXzm3g=.882e76f5-c157-49c8-a175-94693c323a8e@github.com> On Wed, 23 Nov 2022 10:14:23 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > remove caching if notifyJvmtiEvents in yieldContinuation Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11304 From haosun at openjdk.org Thu Nov 24 03:01:19 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 24 Nov 2022 03:01:19 GMT Subject: RFR: 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware In-Reply-To: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> References: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> Message-ID: <RdCiDrgxxJAkU1Hu50jd2QGOIam-9hDbjAbZqfiQLJY=.d89258e5-0248-46a5-a99f-9739c9e565e8@github.com> On Tue, 1 Nov 2022 06:08:26 GMT, Hao Sun <haosun at openjdk.org> wrote: > SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. > > 1) tier1~3 passed without new failures. > > 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. > > > Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units > MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms > MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms > MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms > MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms > MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms > MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms > MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms > MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms Thanks for your reviews. I don't think the GHA failure is related to this patch. ------------- PR: https://git.openjdk.org/jdk/pull/10925 From dholmes at openjdk.org Thu Nov 24 03:34:19 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 03:34:19 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <FH8jlP5UoSWSq-pQx_I6I06ps7jFJD0bRxInfQPaMMQ=.33f566fc-8e45-40e2-b1fd-745c1f88f59c@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> <FH8jlP5UoSWSq-pQx_I6I06ps7jFJD0bRxInfQPaMMQ=.33f566fc-8e45-40e2-b1fd-745c1f88f59c@github.com> Message-ID: <lkaDeejRklqtjRMYsz14L2OnXH8aQNYHRkE2-ptZ184=.98da1412-467d-472e-81b1-2114b03d9e52@github.com> On Thu, 24 Nov 2022 01:23:00 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix formatting > > Nice doc update. > Looks good to me. > Thanks, > Serguei Thanks for looking at this @sspitsyn ! ------------- PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Thu Nov 24 03:35:22 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Nov 2022 03:35:22 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> Message-ID: <I7FOXYO1FLuu8Iay9kqGLqcE0HnNxhdo7G3sT5n1yJs=.757776d6-f9f1-4628-a6e9-36d2de9791e9@github.com> On Wed, 23 Nov 2022 19:58:41 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Our CI testing passed. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From mdoerr at openjdk.org Thu Nov 24 08:37:50 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Nov 2022 08:37:50 GMT Subject: Integrated: 8297445: PPC64: Represent Registers as values In-Reply-To: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> References: <IiookLRG8cpr29imRrsj_-LF5GHz1zft3F8AGQouPRE=.0b63f51e-eb0a-472a-b6ce-df4e07d1d067@github.com> Message-ID: <9DNw2NJ9IX-TVnIJWBI4PtmV6fTE2VAqC3I4n6_Cvc0=.3113dacf-0101-4f29-bf00-b88fe3574a0b@github.com> On Tue, 22 Nov 2022 18:10:42 GMT, Martin Doerr <mdoerr at openjdk.org> wrote: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11 ([JDK-8297426](https://bugs.openjdk.org/browse/JDK-8297426)). > Note: Implicit conversion from `intptr_t` to `RegisterOrConstant` is no longer supported. That's why I had to replace some `add` instructions. This pull request has now been integrated. Changeset: 9c77e41b Author: Martin Doerr <mdoerr at openjdk.org> URL: https://git.openjdk.org/jdk/commit/9c77e41b81ebd28bd92ea7adad605981a5519046 Stats: 824 lines in 13 files changed: 163 ins; 429 del; 232 mod 8297445: PPC64: Represent Registers as values Reviewed-by: mbaesken, rrich ------------- PR: https://git.openjdk.org/jdk/pull/11297 From aph at openjdk.org Thu Nov 24 08:48:17 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 08:48:17 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v26] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <jZqobtM95IEtPbU4NhsxvcEcH3lKHZcky-ORJXA6rPU=.b05a4d4c-60c6-494f-b972-59777ba9fe9f@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/30f150e1..1395b52f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=24-25 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 09:09:27 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 09:09:27 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> Message-ID: <Ncf6rM5qiP5j960FmjLMbLLFvtkOoPRoxCzRrlk6x1M=.6abc465a-442f-4591-9d07-3f326c90d561@github.com> On Wed, 23 Nov 2022 19:16:23 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: > Looks good (just some last minor comments). I did not focus on the tests, nor too closely at all of the HotSpot changes. > > Something for future investigation perhaps (if not already thought about): consider using a persistent map, a Hash Array Mapped Trie (HAMT), for storing scoped keys and values, which could potentially remove the need for the cache of replace the cache when many values are in scope. The HAMT's structural sharing properties, wide-branching factor, and `Integer.bitCount` being intrinsic all make for an efficient implementation. I've certainly considered a HAMT. (I've considered everything! It's been a long journey.) The trouble is that using one penalizes binding operations because a persistent HAMT requires path copying for insertions. This won't matter if your bind operation is rare, such as at the start of a large block of code. However, it might not be. Consider a common case such as a parallel stream using fork/join. void parallelRunWithCredentials(List<?> aList, Runnable aRunnable) { var permissions = ScopedValue.where(PERMISSION, getCredentials()); aList.parallelStream().forEach(() -> permissions.run(aRunnable)); } Because the binding operation `permissions.run()` is invoked for every element on the list, we must make the binding operation as fast as it can possibly be. In addition, the cache is _extremely_ fast. Typically it's four instructions, and C2 almost always hoists values, once looked up in cache, into registers. A HAMT lookup, while fast and (mostly) O(1), is more complex. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From haosun at openjdk.org Thu Nov 24 09:13:50 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 24 Nov 2022 09:13:50 GMT Subject: Integrated: 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware In-Reply-To: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> References: <9xWFCh2jeZ21K8ORvblluWxe6F6vDqVyaFSr-U-BXQk=.a8cdb873-91fd-444a-b611-666808bf8d5d@github.com> Message-ID: <qSzGllCUh7u7qAHi0uYyjMi48NqgR0C34kUjQqZ4yC4=.a3dc2d1e-f57e-4e93-9c62-33ab4ab51ef1@github.com> On Tue, 1 Nov 2022 06:08:26 GMT, Hao Sun <haosun at openjdk.org> wrote: > SHA512 intrinsic for AArch64 was implemented in JDK-8165404. But it was not auto-enabled due to the lack of full test on real hardware. In this patch, we set this intrinsic enabled by default on hardware with sha512 feature support, after we did the following evaluation. > > 1) tier1~3 passed without new failures. > > 2) we ran the JMH test case MessageDigests.java on all available sha512 feature supported CPUs on our hands including Neoverse V1, Neoverse N2 and Apple silicon(M1). We witnessed about 1.3x ~ 3x performance uplifts. Here shows the data on V1. > > > Benchmark (digesterName) (length) (provider) Mode Cnt Before After Units > MessageDigests.digest SHA-384 64 DEFAULT thrpt 5 2381.028 6161.576 ops/ms > MessageDigests.digest SHA-384 16384 DEFAULT thrpt 5 20.641 60.493 ops/ms > MessageDigests.digest SHA-512 64 DEFAULT thrpt 5 2407.225 6140.680 ops/ms > MessageDigests.digest SHA-512 16384 DEFAULT thrpt 5 20.633 60.942 ops/ms > MessageDigests.getAndDigest SHA-384 64 DEFAULT thrpt 5 1962.740 4714.510 ops/ms > MessageDigests.getAndDigest SHA-384 16384 DEFAULT thrpt 5 20.474 61.360 ops/ms > MessageDigests.getAndDigest SHA-512 64 DEFAULT thrpt 5 1949.511 4552.723 ops/ms > MessageDigests.getAndDigest SHA-512 16384 DEFAULT thrpt 5 20.477 59.693 ops/ms This pull request has now been integrated. Changeset: 8b739706 Author: Hao Sun <haosun at openjdk.org> Committer: Nick Gasson <ngasson at openjdk.org> URL: https://git.openjdk.org/jdk/commit/8b7397064b5b492b03bc8363f6ba74c70ce7d4a0 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod 8296208: AArch64: Enable SHA512 intrinsic by default on supported hardware Reviewed-by: njian, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/10925 From aph at openjdk.org Thu Nov 24 09:29:27 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 09:29:27 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> Message-ID: <UfQPCmn8ML0YG1uxizsdAleLFUW2USTng_I8Fh_3_Vw=.453ddf9c-14b6-48f1-98cd-c40e45e2e885@github.com> On Wed, 23 Nov 2022 18:49:07 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: >> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: >> >> - Merge master >> - javadoc >> - Feedback from reviewers >> - Update src/hotspot/share/classfile/vmSymbols.hpp >> - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 >> - Update src/java.base/share/classes/java/lang/VirtualThread.java >> >> Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> >> - Update src/java.base/share/classes/java/lang/Thread.java >> >> Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> >> - Update src/hotspot/cpu/aarch64/aarch64.ad >> >> Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> >> - Feedback from reviewers >> - Remove incorrect assertion. >> - ... and 42 more: https://git.openjdk.org/jdk/compare/2afb4c33...30f150e1 > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 209: > >> 207: final int bitmask; >> 208: >> 209: private static final Object NIL = new Object(); > > Suggestion: > > static final Object NO_VALUE = new Object(); It not very important, but I'm going to push back (very gently) on this one. "nil: noun. nothing; naught; zero. adjective. having no value or existence." That is the exact literal meaning of this sentinel. Also, "nil" has been used with this meaning in programming languages for 60 years. What is your objection to it here? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From sgehwolf at openjdk.org Thu Nov 24 10:07:50 2022 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 24 Nov 2022 10:07:50 GMT Subject: Integrated: 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory In-Reply-To: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> References: <oeClK9flGtbMJwedbMytX5I_94NaXNkNQ9XMjACBsSI=.7d360f67-460a-4631-9743-32e970b16a9f@github.com> Message-ID: <eTmLfZAK0DL1mNbdlBrcVb7zaERG3Qqvide6A4aoLx0=.bb5d36b4-d173-4e83-9350-cc1d7a28e4cd@github.com> On Mon, 14 Nov 2022 20:19:29 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote: > Please review this addition to the jdk.ContainerConfigration event which adds information > about the container host. Specifically, the total amount of memory of the host system. > > Testing: > - [x] New test case (passed, fails before) > - [x] JFR tests. Passed. > > Thoughts? This pull request has now been integrated. Changeset: 3c4d5204 Author: Severin Gehwolf <sgehwolf at openjdk.org> URL: https://git.openjdk.org/jdk/commit/3c4d5204ff96280b123f42a8cfbaef308e470b69 Stats: 48 lines in 7 files changed: 43 ins; 0 del; 5 mod 8296671: [JFR] jdk.ContainerConfiguration event should include host total memory Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/11143 From aph at openjdk.org Thu Nov 24 10:26:49 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:26:49 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v27] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <IBtCXRkLuTntIjl0zgLWvBwGFUnnIStbA6SpbLBZtWQ=.85e76fe9-7415-4081-91d7-8bc0990f1c26@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Fix merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/1395b52f..3a6f8037 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=25-26 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:31:08 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:31:08 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v28] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <q4cDWQOjrntD6P6Z6TmoqrrozjgLXJ3ZoeujSSaYYvo=.ab5f15c4-cfc6-4cf8-9dc3-ff4e210f2002@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/3a6f8037..4bcfa52e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:31:16 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:31:16 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> Message-ID: <bD4cLbVgLOB8YvcUZn_mB4hN0kIJjqwv8zAoTjuCcjc=.8b5a0595-20da-4ded-a0b7-fd7a4ccec8fc@github.com> On Wed, 23 Nov 2022 18:47:28 GMT, Paul Sandoz <psandoz at openjdk.org> wrote: >> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: >> >> - Merge master >> - javadoc >> - Feedback from reviewers >> - Update src/hotspot/share/classfile/vmSymbols.hpp >> - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 >> - Update src/java.base/share/classes/java/lang/VirtualThread.java >> >> Co-authored-by: Alan Bateman <Alan.Bateman at oracle.com> >> - Update src/java.base/share/classes/java/lang/Thread.java >> >> Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> >> - Update src/hotspot/cpu/aarch64/aarch64.ad >> >> Co-authored-by: ExE Boss <3889017+ExE-Boss at users.noreply.github.com> >> - Feedback from reviewers >> - Remove incorrect assertion. >> - ... and 42 more: https://git.openjdk.org/jdk/compare/2afb4c33...30f150e1 > > src/hotspot/share/classfile/vmSymbols.hpp line 401: > >> 399: template(daemon_name, "daemon") \ >> 400: template(run_method_name, "run") \ >> 401: template(call_method_name, "call") \ > > Is this used? No. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:36:16 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:36:16 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v29] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <JbMgL6lq9fAgXToGGQXTZ0VCLaxqhIcuyfV1s2hYQuc=.c0206b78-b819-4e47-94a1-8b074fb9649b@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/4bcfa52e..c10a5d79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=27-28 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:46:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:46:56 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v30] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <a-G2J0JLXbKXOvaAwKgsWwXwwxYX3Wqak7jud8-e8k0=.dc7a4da5-5151-4ec8-9cc8-21451ef462df@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with five additional commits since the last revision: - Merge branch 'JDK-8286666' of https://github.com/theRealAph/jdk into JDK-8286666 - Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> - Reviewer feedback - Update src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/c10a5d79..15db2a30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=28-29 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:49:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:49:56 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v31] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <bsuy-9LXwmiTxhEFGZvumImy26Z7eX7Xl_Gn3dFHmDs=.4cc221d6-dce7-4006-a9b0-3fb3f1c5359f@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/15db2a30..903780d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=29-30 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From aph at openjdk.org Thu Nov 24 10:57:51 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 10:57:51 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v32] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <Ib9AGPSzmfQZO6BaBJnEHR3YVjg6UKGEArwR_Xb19Wc=.c9c250cd-0ed6-4c0e-92e9-599ce2adc8d0@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/903780d6..1b3c39bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=30-31 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From alanb at openjdk.org Thu Nov 24 11:53:29 2022 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 24 Nov 2022 11:53:29 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> Message-ID: <DZcjjvwbuHSQU-Soa9dsdAh6ApNf5BJYagaABFIaiFQ=.d32e534f-8841-4333-acdb-88acd72b5022@github.com> On Wed, 23 Nov 2022 10:14:23 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > remove caching if notifyJvmtiEvents in yieldContinuation Would it be possible to summarize behavior for when an agent enables the capability as a virtual thread executes for the first time or it continues after yield? More specifically JVMTI will be notified of a mount end without a correspond mount begin. It might be that we can narrow this down to if finish_VTMS_transition is okay without a preceding start_VTMS_transition. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From sspitsyn at openjdk.org Thu Nov 24 12:48:59 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 24 Nov 2022 12:48:59 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> References: <Q9aSWyhdNcS73UVmoosyrDRJZnqqAPk_G99quTgutxA=.76691773-e4c2-4190-bc6d-e54d11f1ae33@github.com> <mBAIPYIyFTsHqBrLMOSuKna6IR-wbWgF1IVxnl2JGe0=.7c7fac27-8946-412e-9f27-d34ebec273c8@github.com> Message-ID: <EHOT_Q360AsxHbkidYtVt7nLcqsMUB2vIsHFAEtL8ro=.1c2639b4-ea05-4526-9a05-9de838465aa9@github.com> On Wed, 23 Nov 2022 10:14:23 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > remove caching if notifyJvmtiEvents in yieldContinuation I've forgotten the `JvmtiVTMSTransitionDisabler` is not going to work before the `notifyJvmtiEvents` is set to `true`. I agree, we may want to allow `start_VTMS_transition/finish_VTMS_transition` not properly paired as you suggest. But then it is not good that we loose the ability to strictly check/assert pairing of VTMS transition notifications for other cases. Need to think a bit more on this. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From stefank at openjdk.org Thu Nov 24 13:21:32 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Nov 2022 13:21:32 GMT Subject: RFR: 8296886: Fix various include sort order issues [v3] In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <cfAG-KZqSsFuVgG1zJ2w5lsRQK2LhioHEZZFCi4lcrY=.6ac8d359-4467-4bc9-83dc-28e6d64372d5@github.com> > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge remote-tracking branch 'upstream/master' into 8296886_various_include_order_fixes - Cleanups - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes - Various include order fixes ------------- Changes: https://git.openjdk.org/jdk/pull/11108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11108&range=02 Stats: 325 lines in 116 files changed: 143 ins; 164 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/11108.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11108/head:pull/11108 PR: https://git.openjdk.org/jdk/pull/11108 From stefank at openjdk.org Thu Nov 24 13:21:33 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Nov 2022 13:21:33 GMT Subject: RFR: 8296886: Fix various include sort order issues [v2] In-Reply-To: <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> <JjDvqbScU0zoekDaaXbN4IUU0bv9tdm2Pvev5sqVggw=.e17fde8e-4462-4803-88e8-d1af700812e7@github.com> Message-ID: <aB2D2T5oao8JflrSY4Z7sVOqRW9XaDbbwBXWaAvgxko=.5037b486-5396-4f8e-abfc-6bce36d9c5a1@github.com> On Wed, 16 Nov 2022 16:17:48 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: >> The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. >> >> *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. >> >> One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share , just like the other platform-independent headers in HotSpot. >> >> While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Cleanups > - Merge remote-tracking branch 'upstream/master' into various_include_order_fixes > - Various include order fixes Thanks all for reviewing! I'm going to let the GHA run before integrating this change. ------------- PR: https://git.openjdk.org/jdk/pull/11108 From aph at openjdk.org Thu Nov 24 14:05:41 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Nov 2022 14:05:41 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> Message-ID: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> > JEP 429 implementation. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Unused variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10952/files - new: https://git.openjdk.org/jdk/pull/10952/files/1b3c39bc..37441eeb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10952&range=31-32 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10952.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10952/head:pull/10952 PR: https://git.openjdk.org/jdk/pull/10952 From stefank at openjdk.org Thu Nov 24 15:10:25 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Nov 2022 15:10:25 GMT Subject: Integrated: 8296886: Fix various include sort order issues In-Reply-To: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> References: <qmA4OGVYmaXWA4xQMksmBuQPTFZuNTuLSB7qShPbtwI=.d43529a9-f963-44fc-b760-ea35d8e9ed0b@github.com> Message-ID: <asFADZv1u0zYb1T9YA5_T7IUi1ul8A786WlEwh3vJyQ=.4896c3d5-6f4f-4b8e-be29-36e73580b956@github.com> On Fri, 11 Nov 2022 14:26:20 GMT, Stefan Karlsson <stefank at openjdk.org> wrote: > The sorted blocks of includes have deteriorated to the point that I felt compelled to clean up some of the issues. > > *EDIT*: The below discussion has been deferred out of this PR. Now this only deals with fixing the placement and sorting of includes, plus some surrounding blank lines. > > One of the more prevalent issues is that files in src/hotspot/share/include are not properly sorted. There has been some discussion that that was done on purpose, but it just adds another exception to the include rules that don't have any practical purposes, IMHO. It also goes against our written style guide around include files. One argument why it was OK have the files in include/ pushed up to the top of the sorted block, was that the file was included without specifying a directory. That's an argument that contradicts how we treat platform-dependent files, which (unfortunately) often also are specified without a prefixed directory, so I don't think that's a good enough argument, again IMHO. To remove this special case, I've removed the extraneous make file entry to have src/hotspot/share/include in the set of directories to search for headers when compiling HotSpot. Now all the header files in src/hotspot/share/include gets included by specifying the path from src/hotspot/share, just like the other platform-independent headers in HotSpot. > > While going over the include headers I've also cleaned up surrounding whitespaces and incorrect include guards. This pull request has now been integrated. Changeset: df6cf1e4 Author: Stefan Karlsson <stefank at openjdk.org> URL: https://git.openjdk.org/jdk/commit/df6cf1e41d0fc2dd5f5c094f66c7c8969cf5548d Stats: 325 lines in 116 files changed: 143 ins; 164 del; 18 mod 8296886: Fix various include sort order issues Reviewed-by: kbarrett, dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/11108 From bkilambi at openjdk.org Thu Nov 24 15:56:08 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 24 Nov 2022 15:56:08 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5] In-Reply-To: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> Message-ID: <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Resolve merge conflicts with master - Merge branch 'master' into JDK-8293488 - Removed svesha3 feature check for eor3 - Changed the modifier order preference in JTREG test - Modified JTREG test to include feature constraints - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - eor a, a, b eor a, a, c can be optimized to single instruction - eor3 a, b, c This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - Benchmark gain TestEor3.test1Int 10.87% TestEor3.test1Long 8.84% TestEor3.test2Int 21.68% TestEor3.test2Long 21.04% The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. ------------- Changes: https://git.openjdk.org/jdk/pull/10407/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=04 Stats: 330 lines in 7 files changed: 295 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/10407.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10407/head:pull/10407 PR: https://git.openjdk.org/jdk/pull/10407 From rehn at openjdk.org Thu Nov 24 16:19:36 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 24 Nov 2022 16:19:36 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> Message-ID: <7V4P4CjC-2X99SkP8aw4kIm7S1_Vth4q_ClnqJoKA0A=.7db6d3bf-f081-4900-9531-729be031cfad@github.com> On Mon, 14 Nov 2022 10:16:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. >> >> Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. >> >> Examples: >> >> NMT, thread stacks, 10000 Threads, default: >> >> >> - Thread (reserved=10332400KB, committed=331828KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=300988KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) >> >> >> NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: >> >> >> - Thread (reserved=10332400KB, committed=10284360KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=10253520KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > test changes, comment change Hey, a questions, what you think about just pre-touching a part of the stack? ------------- PR: https://git.openjdk.org/jdk/pull/10403 From shade at openjdk.org Thu Nov 24 19:30:24 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Nov 2022 19:30:24 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods Message-ID: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. Additional testing: - [x] Linux x86_64 fastdebug `tier1` - [x] Linux x86_64 fastdebug `tier2` - [ ] Linux x86_32 fastdebug `tier1` - [ ] Linux x86_32 fastdebug `tier2` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/11359/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11359&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297600 Stats: 32 lines in 8 files changed: 32 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11359.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11359/head:pull/11359 PR: https://git.openjdk.org/jdk/pull/11359 From rehn at openjdk.org Fri Nov 25 11:07:34 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 25 Nov 2022 11:07:34 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> Message-ID: <z4fXh8kYYdaorgkU9H4a6-MHRWTItuZmtnJgA5fKJWs=.19d34406-4c64-4b69-8652-6b8f7204bed3@github.com> On Mon, 14 Nov 2022 10:16:58 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. >> >> Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. >> >> Examples: >> >> NMT, thread stacks, 10000 Threads, default: >> >> >> - Thread (reserved=10332400KB, committed=331828KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=300988KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) >> >> >> NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: >> >> >> - Thread (reserved=10332400KB, committed=10284360KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=10253520KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > test changes, comment change I have seen some workload where just one or two threads are using large stacks and the majority is not. I.e. -Xss256m and similar, having a few hundreds of threads and pre-touching both stacks and heap you may run out of physical pages and start swapping. It would be useful is such case to just pre-touch the first meg or so on the stack. I thinking -Xss256m -XX:PreTouchStackSize=1m Maybe it is not as useful as in my head... Also I don't know if a user may retrieve good information about stack size usage. I'll let you be the arbitrator. I'll approve it as is, if this what you think is best, thanks. ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10403 From stuefe at openjdk.org Fri Nov 25 11:07:40 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 25 Nov 2022 11:07:40 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: <7V4P4CjC-2X99SkP8aw4kIm7S1_Vth4q_ClnqJoKA0A=.7db6d3bf-f081-4900-9531-729be031cfad@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> <7V4P4CjC-2X99SkP8aw4kIm7S1_Vth4q_ClnqJoKA0A=.7db6d3bf-f081-4900-9531-729be031cfad@github.com> Message-ID: <UzDHKa922TY5QlgqC92ZTIrqZDrHKnM4N_mwhGD1TSs=.2c5ded4f-6245-4c58-9548-4cbab72ff972@github.com> On Thu, 24 Nov 2022 16:17:26 GMT, Robbin Ehn <rehn at openjdk.org> wrote: > Hey, a questions, what you think about just pre-touching a part of the stack? Hmm, I guess that would work, but what usage case did you have in mind? Also, how would we specify it, in percent of thread stack size, or as an absolute value? ------------- PR: https://git.openjdk.org/jdk/pull/10403 From bkilambi at openjdk.org Fri Nov 25 11:07:45 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 25 Nov 2022 11:07:45 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5] In-Reply-To: <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> Message-ID: <ehI3TawhCN3CY0S0hllf7GERELtPsRMooduYK5zrPdw=.ed264691-a431-43bb-9740-7ea1b2aae964@github.com> On Thu, 24 Nov 2022 15:56:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Resolve merge conflicts with master > - Merge branch 'master' into JDK-8293488 > - Removed svesha3 feature check for eor3 > - Changed the modifier order preference in JTREG test > - Modified JTREG test to include feature constraints > - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension > > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those > SHA3 instructions - "eor3" performs an exclusive OR of three vectors. > This is helpful in applications that have multiple, consecutive "eor" > operations which can be reduced by clubbing them into fewer operations > using the "eor3" instruction. For example - > eor a, a, b > eor a, a, c > can be optimized to single instruction - eor3 a, b, c > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and > a micro benchmark to assess the performance gains with this patch. > Following are the results of the included micro benchmark on a 128-bit > aarch64 machine that supports Neon, SVE2 and SHA3 features - > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > The numbers shown are performance gains with using Neon eor3 instruction > over the master branch that uses multiple "eor" instructions instead. > Similar gains can be observed with the SVE2 "eor3" version as well since > the "eor3" instruction is unpredicated and the machine under test uses a > maximum vector width of 128 bits which makes the SVE2 code generation very > similar to the one with Neon. Hello, I have resolved merge conflicts and have uploaded the latest patch here. Please review. I messed up with the re-request review option but I now understand how it works (I thought I could re-request from everyone but realized that the previous selection is removed if i select another reviewer). Apologies if any inconvenience caused. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From epeter at openjdk.org Fri Nov 25 14:06:09 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 25 Nov 2022 14:06:09 GMT Subject: RFR: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names Message-ID: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> As described in [JDK-8297640](https://bugs.openjdk.org/browse/JDK-8297640), the buffer is too small, I increased the size from 512 to 1024. The string needs to be a few characters larger, now that we have the additional feature `avx512_ifma`, added in [JDK-8288047](https://bugs.openjdk.org/browse/JDK-8288047). I manually tested it with `./java --version`, used to crash, now works. Running larger test suite now... ------------- Commit messages: - 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names Changes: https://git.openjdk.org/jdk/pull/11366/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11366&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297640 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11366.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11366/head:pull/11366 PR: https://git.openjdk.org/jdk/pull/11366 From chagedorn at openjdk.org Fri Nov 25 14:06:09 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 25 Nov 2022 14:06:09 GMT Subject: RFR: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names In-Reply-To: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> References: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> Message-ID: <wUQo1Lhppe_V4aTkGkqoryxFMfOJezIATcbhMp1NqSM=.4b1ef1ca-b563-4e63-a785-77179d5ed194@github.com> On Fri, 25 Nov 2022 13:46:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > As described in [JDK-8297640](https://bugs.openjdk.org/browse/JDK-8297640), the buffer is too small, I increased the size from 512 to 1024. > > The string needs to be a few characters larger, now that we have the additional feature `avx512_ifma`, added in [JDK-8288047](https://bugs.openjdk.org/browse/JDK-8288047). > > I manually tested it with `./java --version`, used to crash, now works. > Running larger test suite now... That looks reasonable! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11366 From rehn at openjdk.org Fri Nov 25 14:45:17 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 25 Nov 2022 14:45:17 GMT Subject: RFR: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names In-Reply-To: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> References: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> Message-ID: <1e6lURBQT0cWfKMw-Sf4w2kkEDkedtCiDuG4pTeqUlI=.25cb2636-d9a0-4368-85e9-40fcfbdb4262@github.com> On Fri, 25 Nov 2022 13:46:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > As described in [JDK-8297640](https://bugs.openjdk.org/browse/JDK-8297640), the buffer is too small, I increased the size from 512 to 1024. > > The string needs to be a few characters larger, now that we have the additional feature `avx512_ifma`, added in [JDK-8288047](https://bugs.openjdk.org/browse/JDK-8288047). > > I manually tested it with `./java --version`, used to crash, now works. > Running larger test suite now... Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11366 From stuefe at openjdk.org Fri Nov 25 15:34:14 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 25 Nov 2022 15:34:14 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: <z4fXh8kYYdaorgkU9H4a6-MHRWTItuZmtnJgA5fKJWs=.19d34406-4c64-4b69-8652-6b8f7204bed3@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <LuMQx6RZTOp5_Yl_v6gFaQw4lMzhdTv7neo45cZX1Pk=.d1968e1b-3384-414a-8573-745e1e41faba@github.com> <z4fXh8kYYdaorgkU9H4a6-MHRWTItuZmtnJgA5fKJWs=.19d34406-4c64-4b69-8652-6b8f7204bed3@github.com> Message-ID: <ljhWVNRnmTwVpen2sk4xLWPSWhIjtLojkfffI2QIgKg=.a183d55f-9444-4b02-8614-b6577520e65f@github.com> On Fri, 25 Nov 2022 10:30:33 GMT, Robbin Ehn <rehn at openjdk.org> wrote: > I have seen some workload where just one or two threads are using large stacks and the majority is not. I.e. -Xss256m and similar, Yikes... >having a few hundreds of threads and pre-touching both stacks and heap you may run out of physical pages and start swapping. It would be useful is such case to just pre-touch the first meg or so on the stack. > > I thinking -Xss256m -XX:PreTouchStackSize=1m > > Maybe it is not as useful as in my head... Also I don't know if a user may retrieve good information about stack size usage. > > I'll let you be the arbitrator. > I'll think about it. It may make sense considering your example. ------------- PR: https://git.openjdk.org/jdk/pull/10403 From stuefe at openjdk.org Fri Nov 25 18:36:33 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 25 Nov 2022 18:36:33 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray Message-ID: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: 13442 0x00007f58a8ca2f02: sub $0x10,%rsi 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << 13444 0x00007f58a8ca2f0c: test %rsi,%rsi 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << 13446 0x00007f58a8ca2f15: xor %rbx,%rbx 13447 0x00007f58a8ca2f18: shr $0x3,%rsi 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) 13449 0x00007f58a8ca2f21: dec %rsi 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. --- Patch removes one test+jump and adds an assertion for len>0 to zero_memory. Patch ran through SAP nightlies and GHAs. ------------- Commit messages: - remove-redundant-test-from-zeromemory Changes: https://git.openjdk.org/jdk/pull/11372/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11372&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297660 Stats: 11 lines in 1 file changed: 7 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11372.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11372/head:pull/11372 PR: https://git.openjdk.org/jdk/pull/11372 From kvn at openjdk.org Fri Nov 25 20:26:23 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Nov 2022 20:26:23 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> Message-ID: <mcPcLsC4jmCKDn9QaRxsmWWgP_K7kXGW1n1v5XK8Oeo=.2006fc3b-708e-4222-a84f-ef8d421d2479@github.com> On Thu, 24 Nov 2022 00:50:22 GMT, David Holmes <dholmes at openjdk.org> wrote: >> This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. >> >> One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). >> >> This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix formatting Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11340 From duke at openjdk.org Fri Nov 25 20:27:58 2022 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 25 Nov 2022 20:27:58 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable Message-ID: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> test of tier1-5 passed. ------------- Commit messages: - JBS-8292741: Convert JvmtiTagMapTable to ResourceHashtable Changes: https://git.openjdk.org/jdk/pull/11288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292741 Stats: 273 lines in 3 files changed: 42 ins; 135 del; 96 mod Patch: https://git.openjdk.org/jdk/pull/11288.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11288/head:pull/11288 PR: https://git.openjdk.org/jdk/pull/11288 From yyang at openjdk.org Sat Nov 26 04:16:10 2022 From: yyang at openjdk.org (Yi Yang) Date: Sat, 26 Nov 2022 04:16:10 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray In-Reply-To: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> Message-ID: <y8U9jmns9MOwnqYY5IKfZEdwxZv3AJaakxpkynyw0T0=.2b2aafbe-4047-4c69-aace-7326fd60c57e@github.com> On Fri, 25 Nov 2022 18:24:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: > > > 13442 0x00007f58a8ca2f02: sub $0x10,%rsi > 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << > 13444 0x00007f58a8ca2f0c: test %rsi,%rsi > 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << > 13446 0x00007f58a8ca2f15: xor %rbx,%rbx > 13447 0x00007f58a8ca2f18: shr $0x3,%rsi > 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) > 13449 0x00007f58a8ca2f21: dec %rsi > 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} > 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) > > > Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. > > --- > > Patch removes one test+jump and adds an assertion for len>0 to zero_memory. > > Patch ran through SAP nightlies and GHAs. Is it reasonable to remove such checking from the caller? Macro assembler is shared across all assemblers, zero_memory may be used in other place in the future due to its non-ad-hoc implementation. ------------- PR: https://git.openjdk.org/jdk/pull/11372 From kbarrett at openjdk.org Sat Nov 26 04:20:53 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 26 Nov 2022 04:20:53 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> Message-ID: <3ukPUQkmV0I3r2yep09R0pnhXMLH-FXqrVBURArjRfI=.3156a947-b0f3-47dc-b9b1-85f1ae01ac45@github.com> On Wed, 23 Nov 2022 10:24:42 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point No. Any proposal to permit `alignas` needs to address questions around "extended alignment", and especially "over-aligned types". Note that support for extended alignment in any particular context is implementation defined. (C++14 3.11/3) It looks like nearly all uses of compiler-specific alignment decorations (such as `__attribute__((aligned))`) in HotSpot are on variables with static duration. I think those are probably fine. (Early gcc implementations of `alignas` were pretty limited (more so than `__attribute__((aligned))` from what I found on the web), but that seems to have been fixed.) The only exceptions I found are uses of ZCACHE_ALIGN on class data members, which doesn't actually ensure alignment of those members! ------------- PR: https://git.openjdk.org/jdk/pull/11315 From stuefe at openjdk.org Sat Nov 26 08:16:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 26 Nov 2022 08:16:08 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray In-Reply-To: <y8U9jmns9MOwnqYY5IKfZEdwxZv3AJaakxpkynyw0T0=.2b2aafbe-4047-4c69-aace-7326fd60c57e@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <y8U9jmns9MOwnqYY5IKfZEdwxZv3AJaakxpkynyw0T0=.2b2aafbe-4047-4c69-aace-7326fd60c57e@github.com> Message-ID: <ZeSZIoF0ilrZ6u2ZaI4l4m48oWZ0w4Z1TrW4Um1Wtnw=.93199a64-a46a-43ce-8894-4006283bed43@github.com> On Sat, 26 Nov 2022 04:14:01 GMT, Yi Yang <yyang at openjdk.org> wrote: > Is it reasonable to remove such checking from the caller? Macro assembler is shared across all assemblers, zero_memory may be used in other place in the future due to its non-ad-hoc implementation. That is why I added the assertion. Hopefully, such a hypothetical use would surface in tests. ------------- PR: https://git.openjdk.org/jdk/pull/11372 From dholmes at openjdk.org Sat Nov 26 08:21:13 2022 From: dholmes at openjdk.org (David Holmes) Date: Sat, 26 Nov 2022 08:21:13 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods In-Reply-To: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> Message-ID: <xskkJCXnjhdPFIjQ1TlA99E6RaCLON41SkXM4z75KnQ=.ec771a19-fab2-417c-bc6b-98da99c0a0a8@github.com> On Thu, 24 Nov 2022 19:23:29 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. > > We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` Unclear for many JVMCI functions that the thread argument is actually intended/required to be the current thread. It seems unused in many cases so why is it passed? src/hotspot/share/jvmci/jvmciRuntime.cpp line 584: > 582: > 583: JRT_LEAF(void, JVMCIRuntime::log_object(JavaThread* thread, oopDesc* obj, bool as_string, bool newline)) > 584: assert(thread == JavaThread::current(), "pre-condition"); `thread` seems unused in this function and so it is not obvious it has to be the current thread. src/hotspot/share/jvmci/jvmciRuntime.cpp line 611: > 609: > 610: void JVMCIRuntime::write_barrier_pre(JavaThread* thread, oopDesc* obj) { > 611: assert(thread == JavaThread::current(), "pre-condition"); Not obvious thread is expected/required to be current src/hotspot/share/jvmci/jvmciRuntime.cpp line 616: > 614: > 615: void JVMCIRuntime::write_barrier_post(JavaThread* thread, volatile CardValue* card_addr) { > 616: assert(thread == JavaThread::current(), "pre-condition"); Not obvious thread is expected/required to be current ------------- PR: https://git.openjdk.org/jdk/pull/11359 From aph at openjdk.org Sat Nov 26 09:52:11 2022 From: aph at openjdk.org (Andrew Haley) Date: Sat, 26 Nov 2022 09:52:11 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray In-Reply-To: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> Message-ID: <2CZ8Cau6RfyUo9KH2lv-hqf5bY3TgHlvF6cN3iSEbig=.2894ba10-76cf-4f41-8796-902d1693d979@github.com> On Fri, 25 Nov 2022 18:24:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: > > > 13442 0x00007f58a8ca2f02: sub $0x10,%rsi > 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << > 13444 0x00007f58a8ca2f0c: test %rsi,%rsi > 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << > 13446 0x00007f58a8ca2f15: xor %rbx,%rbx > 13447 0x00007f58a8ca2f18: shr $0x3,%rsi > 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) > 13449 0x00007f58a8ca2f21: dec %rsi > 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} > 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) > > > Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. > > --- > > Patch removes one test+jump and adds an assertion for len>0 to zero_memory. > > Patch ran through SAP nightlies and GHAs. Surely a macro assembler function that takes a length should not misbehave of that length is zero. It's a fairly basic correctness criterion that it most not, because zero is a valid number. We shouldn't be building fragility into the system. I agree that it makes sense to remove one of the checks, but it should be the one in the caller. ------------- PR: https://git.openjdk.org/jdk/pull/11372 From stuefe at openjdk.org Sat Nov 26 10:47:11 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 26 Nov 2022 10:47:11 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v2] In-Reply-To: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> Message-ID: <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> > In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: > > > 13442 0x00007f58a8ca2f02: sub $0x10,%rsi > 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << > 13444 0x00007f58a8ca2f0c: test %rsi,%rsi > 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << > 13446 0x00007f58a8ca2f15: xor %rbx,%rbx > 13447 0x00007f58a8ca2f18: shr $0x3,%rsi > 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) > 13449 0x00007f58a8ca2f21: dec %rsi > 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} > 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) > > > Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. > > --- > > Patch removes one test+jump and adds an assertion for len>0 to zero_memory. > > Patch ran through SAP nightlies and GHAs. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - remove outer conditional jump - Revert "remove-redundant-test-from-zeromemory" This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11372/files - new: https://git.openjdk.org/jdk/pull/11372/files/4f95969d..3372e965 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11372&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11372&range=00-01 Stats: 12 lines in 2 files changed: 3 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11372.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11372/head:pull/11372 PR: https://git.openjdk.org/jdk/pull/11372 From stuefe at openjdk.org Sat Nov 26 10:49:06 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 26 Nov 2022 10:49:06 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray In-Reply-To: <2CZ8Cau6RfyUo9KH2lv-hqf5bY3TgHlvF6cN3iSEbig=.2894ba10-76cf-4f41-8796-902d1693d979@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <2CZ8Cau6RfyUo9KH2lv-hqf5bY3TgHlvF6cN3iSEbig=.2894ba10-76cf-4f41-8796-902d1693d979@github.com> Message-ID: <LOmDNEFvOrTztkBqoxF0MWgXGEN5W_KUfYf1uqEDeOY=.9ebca44a-be14-41ce-a024-b4c0a78cb7cd@github.com> On Sat, 26 Nov 2022 09:48:07 GMT, Andrew Haley <aph at openjdk.org> wrote: > Surely a macro assembler function that takes a length should not misbehave of that length is zero. It's a fairly basic correctness criterion that it most not, because zero is a valid number. We shouldn't be building fragility into the system. I agree that it makes sense to remove one of the checks, but it should be the one in the caller. Okay, did that. I did it the other way around originally since the outer test cannot be removed, it's the sub of the header size. So the test in zero_memory is still redundant, but the outer jz has been removed at least: 0x00007f8080ca2f02: sub $0x10,%rsi << initialize_body 0x00007f8080ca2f06: test %rsi,%rsi << zero_memory 0x00007f8080ca2f09: je 0x00007f8080ca2f20 << zero_memory ------------- PR: https://git.openjdk.org/jdk/pull/11372 From aph at openjdk.org Sat Nov 26 11:08:08 2022 From: aph at openjdk.org (Andrew Haley) Date: Sat, 26 Nov 2022 11:08:08 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v2] In-Reply-To: <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> Message-ID: <6a2fd7KnrCdLb7UTiJW2FVx5Zf0yIN9n0j9eGao5y_c=.260a21b4-9a6f-404b-9f72-1557ac99b856@github.com> On Sat, 26 Nov 2022 10:47:11 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: >> >> >> 13442 0x00007f58a8ca2f02: sub $0x10,%rsi >> 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << >> 13444 0x00007f58a8ca2f0c: test %rsi,%rsi >> 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << >> 13446 0x00007f58a8ca2f15: xor %rbx,%rbx >> 13447 0x00007f58a8ca2f18: shr $0x3,%rsi >> 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) >> 13449 0x00007f58a8ca2f21: dec %rsi >> 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} >> 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) >> >> >> Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. >> >> --- >> >> Patch removes one test+jump and adds an assertion for len>0 to zero_memory. >> >> Patch ran through SAP nightlies and GHAs. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - remove outer conditional jump > - Revert "remove-redundant-test-from-zeromemory" > > This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11372 From yyang at openjdk.org Sat Nov 26 11:33:16 2022 From: yyang at openjdk.org (Yi Yang) Date: Sat, 26 Nov 2022 11:33:16 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v2] In-Reply-To: <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> Message-ID: <cHwwb0mlt-sJ54J00NQI5k7ZmPVplNiRPYxcLwMT3jg=.bf67b772-8447-40fe-93c1-f700a5f6e8e8@github.com> On Sat, 26 Nov 2022 10:47:11 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: >> >> >> 13442 0x00007f58a8ca2f02: sub $0x10,%rsi >> 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << >> 13444 0x00007f58a8ca2f0c: test %rsi,%rsi >> 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << >> 13446 0x00007f58a8ca2f15: xor %rbx,%rbx >> 13447 0x00007f58a8ca2f18: shr $0x3,%rsi >> 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) >> 13449 0x00007f58a8ca2f21: dec %rsi >> 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} >> 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) >> >> >> Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. >> >> --- >> >> Patch removes one test+jump and adds an assertion for len>0 to zero_memory. >> >> Patch ran through SAP nightlies and GHAs. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - remove outer conditional jump > - Revert "remove-redundant-test-from-zeromemory" > > This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. Marked as reviewed by yyang (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/11372 From yyang at openjdk.org Sat Nov 26 11:36:33 2022 From: yyang at openjdk.org (Yi Yang) Date: Sat, 26 Nov 2022 11:36:33 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v2] In-Reply-To: <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> Message-ID: <sNim4m4VrkFXedtkmZ5Y_oAXdWsS8ye0xDzuUrJnISk=.9b814b85-1f38-40ff-aa3c-ff0b212719fd@github.com> On Sat, 26 Nov 2022 10:47:11 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: >> >> >> 13442 0x00007f58a8ca2f02: sub $0x10,%rsi >> 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << >> 13444 0x00007f58a8ca2f0c: test %rsi,%rsi >> 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << >> 13446 0x00007f58a8ca2f15: xor %rbx,%rbx >> 13447 0x00007f58a8ca2f18: shr $0x3,%rsi >> 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) >> 13449 0x00007f58a8ca2f21: dec %rsi >> 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} >> 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) >> >> >> Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. >> >> --- >> >> Patch removes one test+jump and adds an assertion for len>0 to zero_memory. >> >> Patch ran through SAP nightlies and GHAs. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - remove outer conditional jump > - Revert "remove-redundant-test-from-zeromemory" > > This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. I mean, remove the checking from caller instead of callee. zero_memory can be used in other places given that it is part of general macro assembler. ------------- PR: https://git.openjdk.org/jdk/pull/11372 From stuefe at openjdk.org Sat Nov 26 11:39:52 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 26 Nov 2022 11:39:52 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v2] In-Reply-To: <sNim4m4VrkFXedtkmZ5Y_oAXdWsS8ye0xDzuUrJnISk=.9b814b85-1f38-40ff-aa3c-ff0b212719fd@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <oyp9Vl8_j8isZQWo4m2R87jaSjLaIPPvTweDlNC8hYE=.0805670a-d885-4dbb-89ca-6f412d9d3085@github.com> <sNim4m4VrkFXedtkmZ5Y_oAXdWsS8ye0xDzuUrJnISk=.9b814b85-1f38-40ff-aa3c-ff0b212719fd@github.com> Message-ID: <hrTkJRv2NgxwKecceZXYYU1PuUKlhdgJ8mO90DuvX2g=.4c83f9fe-32da-455a-a948-8e87063b6aa3@github.com> On Sat, 26 Nov 2022 11:33:30 GMT, Yi Yang <yyang at openjdk.org> wrote: > I mean, remove the checking from caller instead of callee. zero_memory can be used in other places given that it is part of general macro assembler. Did I not just do that? Could you check the latest version please? ------------- PR: https://git.openjdk.org/jdk/pull/11372 From jwaters at openjdk.org Sat Nov 26 17:19:10 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 26 Nov 2022 17:19:10 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <3ukPUQkmV0I3r2yep09R0pnhXMLH-FXqrVBURArjRfI=.3156a947-b0f3-47dc-b9b1-85f1ae01ac45@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> <3ukPUQkmV0I3r2yep09R0pnhXMLH-FXqrVBURArjRfI=.3156a947-b0f3-47dc-b9b1-85f1ae01ac45@github.com> Message-ID: <Rh-nRqpxUMfXqerj24BgGPns6Ju7zDppBPGa9EZjehw=.6bcd09fb-1931-4d61-8d0a-1c595bcac7d5@github.com> On Sat, 26 Nov 2022 04:17:23 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > No. Any proposal to permit `alignas` needs to address questions around "extended alignment", and especially "over-aligned types". Note that support for extended alignment in any particular context is implementation defined. (C++14 3.11/3) > > It looks like nearly all uses of compiler-specific alignment decorations (such as `__attribute__((aligned))`) in HotSpot are on variables with static duration. I think those are probably fine. (Early gcc implementations of `alignas` were pretty limited (more so than `__attribute__((aligned))` from what I found on the web), but that seems to have been fixed.) > > The only exceptions I found are uses of ZCACHE_ALIGN on class data members, which doesn't actually ensure alignment of those members! I'm assuming you're referring to this when talking about extended alignment? I'm not sure if you mean this warning has to be included together in the Style Guide with alignas An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). ? end note ] I don't think there's anything we can actually do about ZCACHE_ALIGNED, unfortunately, and the dynamic memory alignment mentioned in 8252584 is still going to be an issue regardless of whichever syntax we use (Utilizing a quick test I cobbled together): `g++ -std=c++14 -o ./align alignment.cpp` #include <cassert> #include <cstdint> #include <iostream> #include <malloc.h> #include <new> class __attribute__((aligned(32))) AlignedVec { double x, y, z; }; int main() { std::cout << "sizeof(AlignedVec) is " << sizeof(AlignedVec) << '\n'; std::cout << "alignof(AlignedVec) is " << alignof(AlignedVec) << '\n'; auto Vec = AlignedVec{}; auto pVec = new AlignedVec[10]; if(reinterpret_cast<uintptr_t>(&Vec) % alignof(AlignedVec) == 0) std::cout << "Vec is aligned to alignof(AlignedVec)!\n"; else std::cout << "Vec is not aligned to alignof(AlignedVec)!\n"; if(reinterpret_cast<uintptr_t>(pVec) % alignof(AlignedVec) == 0) std::cout << "pVec is aligned to alignof(AlignedVec)!\n"; else std::cout << "pVec is not aligned to alignof(AlignedVec)!\n"; delete[] pVec; } `./align` sizeof(AlignedVec) is 32 alignof(AlignedVec) is 32 Vec is aligned to alignof(AlignedVec)! pVec is not aligned to alignof(AlignedVec)! ------------- PR: https://git.openjdk.org/jdk/pull/11315 From duke at openjdk.org Sat Nov 26 22:24:04 2022 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 26 Nov 2022 22:24:04 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent Message-ID: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. ------------- Commit messages: - 8287400: Make BitMap range parameter names consistent Changes: https://git.openjdk.org/jdk/pull/11375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287400 Stats: 129 lines in 2 files changed: 0 ins; 0 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/11375.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11375/head:pull/11375 PR: https://git.openjdk.org/jdk/pull/11375 From xuelei at openjdk.org Sun Nov 27 05:27:55 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Sun, 27 Nov 2022 05:27:55 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v13] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <4Wmny06cGOWkBkQ1oecPwZLp-SAke1Gun5o1-kKWLQI=.3984224b-526b-44e0-a1e3-e010a7ce7836@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: revert use of assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/4f80245f..ee72fb50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=11-12 Stats: 285 lines in 24 files changed: 15 ins; 181 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Sun Nov 27 08:00:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 27 Nov 2022 08:00:08 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v12] In-Reply-To: <NZslB0NAOVhw3IKe-_JWMhfiSJb4p52wHnftlPsz86E=.44faae07-e7c6-4810-aab7-8019c8808c8a@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> <NZslB0NAOVhw3IKe-_JWMhfiSJb4p52wHnftlPsz86E=.44faae07-e7c6-4810-aab7-8019c8808c8a@github.com> Message-ID: <chToHFhLolTySTkjmihCukuyT2OydSgoJVRagHjzpA8=.21e615e3-631f-412c-ac7b-f5ad1b4fec02@github.com> On Tue, 22 Nov 2022 08:02:51 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Given all the near-duplicated checking of os::snprintf results, I think there is a place for a helper function to package this up. Maybe something like > > ``` > // in class os > // Performs snprintf and asserts the result is non-negative (so there was not > // an encoding error) and that the output was not truncated. > static int snprintf_checked(char* buf, size_t len, const char* fmt, ...) ATTRIBUTE_PRINTF(3, 4); > > // in runtime/os.cpp > int os::snprintf_checked(char* buf, size_t len, const char* fmt, ...) { > va_list args; > va_start(args, fmt); > int result = os::vsnprintf(buf, len, fmt, args); > va_end(args); > assert(result >= 0, "os::snprintf error"); > assert(static_cast<size_t>(result) < size, "os::snprintf truncated"); > return result; > } > ``` > > (I keep waffling over whether the truncation check should be an assert or a guarantee.) > > I've not yet gone through all the changes yet to consider which should do that checking and which should do something different, such as permitting truncation. > > I'm not wedded to that name; indeed, I don't like it that much, as it's kind of inconveniently long. There's a temptation to have os::snprintf forbid truncation and a different function that allows it, but that would require careful auditing of all pre-existing uses of os::snprintf too, so no. How about renaming the existing os::snprintf to something like os::snprintf_unchecked, make os::snprintf the checked version, then, in separate RFEs, revert existing uses to the new API. When all uses of os::snprintf_unchecked are cleared up, remove it. That would make it possible to revert piecemeal while not racing with new uses of os::snprintf, since new callers will use the new checking API automatically. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From kbarrett at openjdk.org Sun Nov 27 11:44:15 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 27 Nov 2022 11:44:15 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> Message-ID: <DZ0clFGSZHZ9Mnn6D43I1Z3zqgGWVEby5OGxhcqMcME=.cc2661d3-aebe-43c2-ada6-81635b02d12b@github.com> On Wed, 23 Nov 2022 10:24:42 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point > > No. Any proposal to permit `alignas` needs to address questions around "extended alignment", and especially "over-aligned types". [...] > > I'm assuming you're referring to this when talking about extended alignment? I'm not sure if you mean this warning has to be included together in the Style Guide with alignas > > ``` > An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is > implementation-defined whether any extended alignments are supported and the contexts in which they > are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: > every over-aligned type is or contains a class type to which extended alignment applies (possibly through a > non-static data member). ? end note ] > ``` That's the paragraph I referenced. A proposal to permit the use of `alignas` in HotSpot code needs to indicate where (if anywhere) the use of extended alignment is permitted. And "never" doesn't work, since (for example) there are a fair number of constants used by various x86 stub generators that are given extended alignment. Like I said, it probably works to use extended alignment for variables with static storage duration (C++14 3.7.1) with existing platforms and build configurations. Variables with automatic storage duration might also work (I haven't done much testing of that). Even for those cases there may be limitations on what values are supported. Heap allocation of over-aligned types does not work until C++17 (in the sense that the requested alignment is not assured). Types or data members of types that are never heap allocated (either directly or by inclusion in some other type) would (probably, since we mostly don't allow the use of C++ thread-local variables) end up falling under one of the above. Conceptually, a type derived from StackObj or only ever included in such would fall under the automatic variable case, but I'm pretty sure there are places where StackObj is abused, such that it doesn't have any useful guarantees. So I suggest over-aligned types should be forbidden (at least until we move to C++17). The Style Guide should also discuss where the `alignas` should be placed. Some non-local variables have it on the declaration, while others on the definition. Perhaps it should it always be on the declaration? Or is there a reason for that inconsistency? > I don't think there's anything we can actually do about ZCACHE_ALIGNED, unfortunately, and the dynamic memory alignment mentioned in 8252584 is still going to be an issue regardless of whichever syntax we use: That dynamic memory alignment is the thing that is fixed by C++17. I think none of the uses of ZCACHE_ALIGN need the requested alignment. They are instead attempting to ensure data members are on different cache lines. We have memory/padded.hpp for that kind of thing. I've already begun a discussion about this with ZGC developers. ------------- PR: https://git.openjdk.org/jdk/pull/11315 From epeter at openjdk.org Sun Nov 27 13:43:16 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 27 Nov 2022 13:43:16 GMT Subject: RFR: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names In-Reply-To: <1e6lURBQT0cWfKMw-Sf4w2kkEDkedtCiDuG4pTeqUlI=.25cb2636-d9a0-4368-85e9-40fcfbdb4262@github.com> References: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> <1e6lURBQT0cWfKMw-Sf4w2kkEDkedtCiDuG4pTeqUlI=.25cb2636-d9a0-4368-85e9-40fcfbdb4262@github.com> Message-ID: <GXvD7LpLiyeQut2_wacRMfGSg2Ole7kyf1WpQdXNWOY=.f61082da-8b91-44c0-b0b1-2853f331d0c4@github.com> On Fri, 25 Nov 2022 14:41:12 GMT, Robbin Ehn <rehn at openjdk.org> wrote: >> As described in [JDK-8297640](https://bugs.openjdk.org/browse/JDK-8297640), the buffer is too small, I increased the size from 512 to 1024. >> >> The string needs to be a few characters larger, now that we have the additional feature `avx512_ifma`, added in [JDK-8288047](https://bugs.openjdk.org/browse/JDK-8288047). >> >> I manually tested it with `./java --version`, used to crash, now works. >> Running larger test suite now... > > Thanks! Thanks @robehn @chhagedorn for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/11366 From epeter at openjdk.org Sun Nov 27 13:44:54 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 27 Nov 2022 13:44:54 GMT Subject: Integrated: 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names In-Reply-To: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> References: <Zv4q7sJKzMEoQzzL0OcWOPo_69ayAvluB22NC_Dund8=.efd7319f-09d5-4afb-a6af-bf0b8d6c0cdc@github.com> Message-ID: <dlnApElPJjMXz3OX0Nuk87tJFm1Mlapk2n5vI6KbHrk=.fb994e7a-5054-4cb0-b931-b018d831585d@github.com> On Fri, 25 Nov 2022 13:46:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote: > As described in [JDK-8297640](https://bugs.openjdk.org/browse/JDK-8297640), the buffer is too small, I increased the size from 512 to 1024. > > The string needs to be a few characters larger, now that we have the additional feature `avx512_ifma`, added in [JDK-8288047](https://bugs.openjdk.org/browse/JDK-8288047). > > I manually tested it with `./java --version`, used to crash, now works. > Running larger test suite now... This pull request has now been integrated. Changeset: 2f83b5c4 Author: Emanuel Peter <epeter at openjdk.org> URL: https://git.openjdk.org/jdk/commit/2f83b5c487f112c175d081ca5882f5032518937a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8297640: Increase buffer size for buf (insert_features_names) in Abstract_VM_Version::insert_features_names Reviewed-by: chagedorn, rehn ------------- PR: https://git.openjdk.org/jdk/pull/11366 From jwaters at openjdk.org Sun Nov 27 13:49:41 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 27 Nov 2022 13:49:41 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> Message-ID: <pPy5eCgVCXREucMdNtbj_xx_h7oeZOhyYfh23RTMa8M=.7b59b2b3-8ca3-4c06-8824-a9ca661e2a32@github.com> > Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Rectify issues mentioned in review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11315/files - new: https://git.openjdk.org/jdk/pull/11315/files/791563f1..1813e8c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11315&range=00-01 Stats: 37 lines in 2 files changed: 32 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11315.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11315/head:pull/11315 PR: https://git.openjdk.org/jdk/pull/11315 From jwaters at openjdk.org Sun Nov 27 13:56:14 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 27 Nov 2022 13:56:14 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <DZ0clFGSZHZ9Mnn6D43I1Z3zqgGWVEby5OGxhcqMcME=.cc2661d3-aebe-43c2-ada6-81635b02d12b@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> <DZ0clFGSZHZ9Mnn6D43I1Z3zqgGWVEby5OGxhcqMcME=.cc2661d3-aebe-43c2-ada6-81635b02d12b@github.com> Message-ID: <fMwPTVU8YY-YUo8GhMsPRh41D85PHxZF1-97XDBDoYk=.ea20084c-2c7a-4d73-813d-fccba024893a@github.com> On Sun, 27 Nov 2022 11:42:00 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > > > No. Any proposal to permit `alignas` needs to address questions around "extended alignment", and especially "over-aligned types". [...] > > > > > > I'm assuming you're referring to this when talking about extended alignment? I'm not sure if you mean this warning has to be included together in the Style Guide with alignas > > ``` > > An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is > > implementation-defined whether any extended alignments are supported and the contexts in which they > > are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: > > every over-aligned type is or contains a class type to which extended alignment applies (possibly through a > > non-static data member). ? end note ] > > ``` > > That's the paragraph I referenced. A proposal to permit the use of `alignas` in HotSpot code needs to indicate where (if anywhere) the use of extended alignment is permitted. And "never" doesn't work, since (for example) there are a fair number of constants used by various x86 stub generators that are given extended alignment. > > Like I said, it probably works to use extended alignment for variables with static storage duration (C++14 3.7.1) with existing platforms and build configurations. > > Variables with automatic storage duration might also work (I haven't done much testing of that). > > Even for those cases there may be limitations on what values are supported. > > Heap allocation of over-aligned types does not work until C++17 (in the sense that the requested alignment is not assured). > > Types or data members of types that are never heap allocated (either directly or by inclusion in some other type) would (probably, since we mostly don't allow the use of C++ thread-local variables) end up falling under one of the above. Conceptually, a type derived from StackObj or only ever included in such would fall under the automatic variable case, but I'm pretty sure there are places where StackObj is abused, such that it doesn't have any useful guarantees. > > So I suggest over-aligned types should be forbidden (at least until we move to C++17). > > The Style Guide should also discuss where the `alignas` should be placed. Some non-local variables have it on the declaration, while others on the definition. Perhaps it should it always be on the declaration? Or is there a reason for that inconsistency? > > > I don't think there's anything we can actually do about ZCACHE_ALIGNED, unfortunately, and the dynamic memory alignment mentioned in 8252584 is still going to be an issue regardless of whichever syntax we use: > > That dynamic memory alignment is the thing that is fixed by C++17. > > I think none of the uses of ZCACHE_ALIGN need the requested alignment. They are instead attempting to ensure data members are on different cache lines. We have memory/padded.hpp for that kind of thing. I've already begun a discussion about this with ZGC developers. I've modified the Style Guide to address some of the issues brought up in the review for the time being > The Style Guide should also discuss where the `alignas` should be placed. Some non-local variables have it on the declaration, while others on the definition. Perhaps it should it always be on the declaration? Or is there a reason for that inconsistency? In my opinion it would be helpful to place it at both declaration and definition, just before the type specifier, so it's easier to determine that a variable has a particular alignment without having to check both. Maybe only specifying it at the declaration if that proves to be too troublesome will be enough however Small side note: Should it also mention that ATTRIBUTE_ALIGNED will be/is deprecated after this style change, and/or that using it is equivalent to `alignas`? After all, as discussed above there isn't actually a difference of using either to specify alignment on whatever object or type it's desired to be used on ------------- PR: https://git.openjdk.org/jdk/pull/11315 From mernst at openjdk.org Sun Nov 27 17:53:30 2022 From: mernst at openjdk.org (Michael Ernst) Date: Sun, 27 Nov 2022 17:53:30 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: <JM2iW2-imTXG8KxxinxGWCXP8FwOesEjeQheWbooMoY=.2dfbad6d-e4e3-43b3-90b9-fdd7ac22e9d1@github.com> References: <Son2XWweoJeNzBalwQ4_ujZfK_jkwdh5tEozAk8QXXo=.e1ccc05f-84d7-454d-be22-5855aeed0c52@github.com> <JM2iW2-imTXG8KxxinxGWCXP8FwOesEjeQheWbooMoY=.2dfbad6d-e4e3-43b3-90b9-fdd7ac22e9d1@github.com> Message-ID: <gOXCJJl7JqlhdMiDjhu5kbAGieFHPelXHl_XvNG3Pyw=.881b6edb-36c1-4044-99da-18b45b28a69b@github.com> On Mon, 26 Sep 2022 16:51:36 GMT, Michael Ernst <mernst at openjdk.org> wrote: >> 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni > > Michael Ernst has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Reinstate typos in Apache code that is copied into the JDK > - Merge ../jdk-openjdk into typos-typos > - Remove file that was removed upstream > - Fix inconsistency in capitalization > - Undo change in zlip > - Fix typos Could someone who knows the undocumented ins and outs of creating JDK pull requests could split this pull request up into multiple PRs? Then it can be merged, rather than wasting all the effort that went into it. ------------- PR: https://git.openjdk.org/jdk/pull/10029 From dholmes at openjdk.org Mon Nov 28 00:48:49 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 00:48:49 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <mcPcLsC4jmCKDn9QaRxsmWWgP_K7kXGW1n1v5XK8Oeo=.2006fc3b-708e-4222-a84f-ef8d421d2479@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> <mcPcLsC4jmCKDn9QaRxsmWWgP_K7kXGW1n1v5XK8Oeo=.2006fc3b-708e-4222-a84f-ef8d421d2479@github.com> Message-ID: <hmwZgIHJHRLNHl7aRr2eE9J5WwWG4ZriI2rXmyPY8zk=.95c1c06e-f02d-41cf-b249-7eb3b7b25de7@github.com> On Fri, 25 Nov 2022 20:22:25 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix formatting > > Good. Thanks @vnkozlov ! ------------- PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Mon Nov 28 01:07:56 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 01:07:56 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent In-Reply-To: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> Message-ID: <CSFmNBqNbzDdm_15JXgBMwFALqnskJeCkCoHkxS61kI=.b47d8b03-bd5c-4c0c-9d14-a3e3e25851af@github.com> On Sat, 26 Nov 2022 22:16:34 GMT, Afshin Zafari <duke at openjdk.org> wrote: > The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. I only see one use of `start_offset` in bitMap.cpp and no uses of `start` in bitMap.hpp - so this is really changing the vast majority of the names. The simplest change for consistency would be to always use `beg` and `end`. ------------- PR: https://git.openjdk.org/jdk/pull/11375 From njian at openjdk.org Mon Nov 28 01:42:06 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Mon, 28 Nov 2022 01:42:06 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5] In-Reply-To: <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> Message-ID: <Y7NV_ihwNeiAie1bjmHoSu7VcWYjcjT0YNqXn4e_MMQ=.8880c3f1-a39a-47bc-8719-a3250399229d@github.com> On Thu, 24 Nov 2022 15:56:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Resolve merge conflicts with master > - Merge branch 'master' into JDK-8293488 > - Removed svesha3 feature check for eor3 > - Changed the modifier order preference in JTREG test > - Modified JTREG test to include feature constraints > - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension > > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those > SHA3 instructions - "eor3" performs an exclusive OR of three vectors. > This is helpful in applications that have multiple, consecutive "eor" > operations which can be reduced by clubbing them into fewer operations > using the "eor3" instruction. For example - > eor a, a, b > eor a, a, c > can be optimized to single instruction - eor3 a, b, c > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and > a micro benchmark to assess the performance gains with this patch. > Following are the results of the included micro benchmark on a 128-bit > aarch64 machine that supports Neon, SVE2 and SHA3 features - > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > The numbers shown are performance gains with using Neon eor3 instruction > over the master branch that uses multiple "eor" instructions instead. > Similar gains can be observed with the SVE2 "eor3" version as well since > the "eor3" instruction is unpredicated and the machine under test uses a > maximum vector width of 128 bits which makes the SVE2 code generation very > similar to the one with Neon. Marked as reviewed by njian (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10407 From dholmes at openjdk.org Mon Nov 28 01:44:06 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 01:44:06 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> Message-ID: <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> On Tue, 22 Nov 2022 14:48:11 GMT, Afshin Zafari <duke at openjdk.org> wrote: > test of tier1-5 passed. Hi Afshin, The general conversion approach seems okay, though it is hard to see exactly how the new table is used compared to the old table i.e. in relation to the role of `JvmtiTagMapEntry` now that it is not needed for the actual table. ?? A few queries below and a lot of nits to fix up (sorry). Thanks. src/hotspot/share/prims/jvmtiTagMap.cpp line 173: > 171: // > 172: static inline jlong tag_for(JvmtiTagMap* tag_map, oop o) { > 173: return tag_map->hashmap()->find(o); } // A CallbackWrapper is a support class for querying and tagging an object // around a callback to a profiler. The constructor does pre-callback // work to get the tag value, klass tag value, ... and the destructor // does the post-callback work of tagging or untagging the object. // // { The formatting of this line is messed up. src/hotspot/share/prims/jvmtiTagMap.cpp line 215: > 213: > 214: // get object tag > 215: The comment and action are now out-of-order. src/hotspot/share/prims/jvmtiTagMap.cpp line 254: > 252: if (obj_tag != current_tag ) { > 253: hashmap->remove(o); > 254: hashmap->add(o, obj_tag); This change is not atomic - is that a problem? The concurrency aspects of using this map are not clear. src/hotspot/share/prims/jvmtiTagMap.cpp line 303: > 301: _referrer_obj_tag = _referrer_hashmap->find(_referrer); > 302: > 303: // get object tag Action the comment and action are now out-of-order. src/hotspot/share/prims/jvmtiTagMap.cpp line 313: > 311: ~TwoOopCallbackWrapper() { > 312: if (!is_reference_to_self()){ > 313: Unnecessary blank line added. src/hotspot/share/prims/jvmtiTagMap.cpp line 345: > 343: > 344: //JvmtiTagMapEntry entry ; > 345: //_hashmap->add_update_remove(&entry, o, tag); What are these comments for? src/hotspot/share/prims/jvmtiTagMap.cpp line 365: > 363: } else { > 364: hashmap->remove(o); > 365: hashmap->add(o,tag); Nit: need space after comma src/hotspot/share/prims/jvmtiTagMap.cpp line 1261: > 1259: // and record the reference and tag value. > 1260: // > 1261: bool do_entry(JvmtiTagMapEntry & key , jlong & value ) { Nit: no space before & src/hotspot/share/prims/jvmtiTagMap.cpp line 1262: > 1260: // > 1261: bool do_entry(JvmtiTagMapEntry & key , jlong & value ) { > 1262: for (int i=0; i<_tag_count; i++) { Pre-existing: need spaces around binary operators = and < src/hotspot/share/prims/jvmtiTagMapTable.cpp line 77: > 75: > 76: void JvmtiTagMapTable::clear() { > 77: struct RemoveAll{ Nit: space before { Stylistically I'm not sure we define local structs like this when defining a "closure" object. ?? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 89: > 87: > 88: JvmtiTagMapTable::~JvmtiTagMapTable() { > 89: clear(); Is this the only use of `clear`? If so we can just inline its code here and remove it from the API. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 95: > 93: //if (obj->fast_no_hash_check()) { > 94: // return 0; > 95: //} else { What are these comments? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 102: > 100: } > 101: > 102: bool JvmtiTagMapTable::add(oop obj, jlong tag) { I'm not seeing that a return value has any use here when it is always expected to be true. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 123: > 121: > 122: void JvmtiTagMapTable::remove_dead_entries(GrowableArray<jlong>* objects) { > 123: struct IsDead{ Nit: space before { Same query about using a local struct for this. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 128: > 126: bool do_entry(JvmtiTagMapEntry const & entry, jlong tag){ > 127: if ( entry.object_no_keepalive() == NULL){ > 128: if(_objects!=NULL){ Nit: need space before { Nit: need space after if Nit: no space after ( src/hotspot/share/prims/jvmtiTagMapTable.hpp line 69: > 67: JvmtiTagMapEntry::get_hash, > 68: JvmtiTagMapEntry::equals > 69: > ResizableResourceHT ; Nit: keep > on previous line Nit: no space before ; src/hotspot/share/prims/jvmtiTagMapTable.hpp line 79: > 77: > 78: void resize_if_needed(); > 79: ResizableResourceHT _rrht_table; This can just be `_table` - no need for the `rrht` prefix. src/hotspot/share/prims/jvmtiTagMapTable.hpp line 104: > 102: class JvmtiTagMapEntryClosure { > 103: public: > 104: virtual bool do_entry(JvmtiTagMapEntry & key , jlong & value) = 0; Nit: no space needed before & ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11288 From xuelei at openjdk.org Mon Nov 28 07:19:27 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 28 Nov 2022 07:19:27 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v14] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <hMkiVzxeWGTOxJhiGVD0pkwz6G_mPb6-b-aNTtuUY1I=.1f28d412-b5eb-4a0d-ba1e-75494a06ce96@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: no check on adlc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/ee72fb50..6d91a6d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=12-13 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From stuefe at openjdk.org Mon Nov 28 07:37:35 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Nov 2022 07:37:35 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v3] In-Reply-To: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> Message-ID: <YjFViZEurRxai1pihV4FK8Nd7r0hWm38VOasPhmuz8g=.d8aec25c-e306-40ad-8d74-c6f99ac321c2@github.com> > In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: > > > 13442 0x00007f58a8ca2f02: sub $0x10,%rsi > 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << > 13444 0x00007f58a8ca2f0c: test %rsi,%rsi > 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << > 13446 0x00007f58a8ca2f15: xor %rbx,%rbx > 13447 0x00007f58a8ca2f18: shr $0x3,%rsi > 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) > 13449 0x00007f58a8ca2f21: dec %rsi > 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} > 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) > > > Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. > > --- > > Patch removes one test+jump and adds an assertion for len>0 to zero_memory. > > Patch ran through SAP nightlies and GHAs. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into double-test-allocate-array - remove outer conditional jump - Revert "remove-redundant-test-from-zeromemory" This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. - remove-redundant-test-from-zeromemory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11372/files - new: https://git.openjdk.org/jdk/pull/11372/files/3372e965..9e1b9333 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11372&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11372&range=01-02 Stats: 791 lines in 27 files changed: 193 ins; 490 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/11372.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11372/head:pull/11372 PR: https://git.openjdk.org/jdk/pull/11372 From thartmann at openjdk.org Mon Nov 28 07:44:28 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Nov 2022 07:44:28 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v2] In-Reply-To: <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <6gcjBNNHk8twSr92oF2TGB90s6F4WnFZWT4xJPmuYoc=.c9c47aad-82a6-44a3-bc50-68e2b9e0a7c6@github.com> Message-ID: <dk5y1-dyHxy3wKEmfkCp0RHbtGOIjLgXfP6xrlvcoe0=.275adde4-621c-4967-b094-1eadb9a2bc9c@github.com> On Wed, 23 Nov 2022 19:58:41 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments Looks reasonable to me. src/hotspot/share/runtime/sharedRuntime.cpp line 472: > 470: } > 471: > 472: jint exp = ((0x7f800000 & doppel) >> (24 -1)) - 127; Suggestion: jint exp = ((0x7f800000 & doppel) >> (24 - 1)) - 127; ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/11301 From xuelei at openjdk.org Mon Nov 28 07:48:23 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 28 Nov 2022 07:48:23 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v15] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <cRrdygopTmQsFJm5dU6f8FWc76n2alY_BTAiAROgxHw=.4d7c7cf4-033a-4366-9112-6735339ee555@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: use checked snprintf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/6d91a6d7..4143f51e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=13-14 Stats: 30 lines in 13 files changed: 0 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Mon Nov 28 07:55:02 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 28 Nov 2022 07:55:02 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v15] In-Reply-To: <cRrdygopTmQsFJm5dU6f8FWc76n2alY_BTAiAROgxHw=.4d7c7cf4-033a-4366-9112-6735339ee555@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <cRrdygopTmQsFJm5dU6f8FWc76n2alY_BTAiAROgxHw=.4d7c7cf4-033a-4366-9112-6735339ee555@github.com> Message-ID: <K_Om_VKTPct0XY4fXg3anJ8MouBf6QXaqhwAyENbprI=.845c2e00-6f95-4beb-8994-0c7a2442e6df@github.com> On Mon, 28 Nov 2022 07:48:23 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > use checked snprintf Please hold on the review. I run into weird issues while using checked snprintf in `adlc`. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From iwalulya at openjdk.org Mon Nov 28 08:34:37 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 28 Nov 2022 08:34:37 GMT Subject: RFR: 8296954: G1: Enable parallel scanning for heap region remset [v2] In-Reply-To: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> References: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> Message-ID: <Z3KRru96hq1olxNqjZ4egmsFUE2CbPvFtK9LiseIUak=.c9a51504-8541-4be8-8528-0258fd99cce5@github.com> > Hi all, > > Please review this change that allows parallel scanning of a heap region's remembered set. More balanced work load distribution in cases where are cards are unevenly distributed among remembered sets. > > Testing: Tier 1-3 > > Thanks Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11173/files - new: https://git.openjdk.org/jdk/pull/11173/files/b0891932..358e550a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11173&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11173&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11173.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11173/head:pull/11173 PR: https://git.openjdk.org/jdk/pull/11173 From jpai at openjdk.org Mon Nov 28 09:02:13 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 28 Nov 2022 09:02:13 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: <gOXCJJl7JqlhdMiDjhu5kbAGieFHPelXHl_XvNG3Pyw=.881b6edb-36c1-4044-99da-18b45b28a69b@github.com> References: <Son2XWweoJeNzBalwQ4_ujZfK_jkwdh5tEozAk8QXXo=.e1ccc05f-84d7-454d-be22-5855aeed0c52@github.com> <JM2iW2-imTXG8KxxinxGWCXP8FwOesEjeQheWbooMoY=.2dfbad6d-e4e3-43b3-90b9-fdd7ac22e9d1@github.com> <gOXCJJl7JqlhdMiDjhu5kbAGieFHPelXHl_XvNG3Pyw=.881b6edb-36c1-4044-99da-18b45b28a69b@github.com> Message-ID: <o4GUMwlhtn6ng4FZjHRzZ3iqJD0dlAWKnXsufxhmbqk=.4920d482-3bba-4814-88b6-6a7e839349e2@github.com> On Sun, 27 Nov 2022 17:49:57 GMT, Michael Ernst <mernst at openjdk.org> wrote: > Could someone who knows the undocumented ins and outs of creating JDK pull requests could split this pull request up into multiple PRs? Then it can be merged, rather than wasting all the effort that went into it. I've raised https://github.com/openjdk/jdk/pull/11385 for one set of changes from this current PR. I'll pick up the other ones shortly in different PRs. ------------- PR: https://git.openjdk.org/jdk/pull/10029 From jpai at openjdk.org Mon Nov 28 09:59:43 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 28 Nov 2022 09:59:43 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files Message-ID: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. ------------- Commit messages: - address review comment by Alexey Ivanov - 8297693: Fix typos in src/hotspot and test/hotspot files Changes: https://git.openjdk.org/jdk/pull/11386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11386&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297693 Stats: 15 lines in 12 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/11386.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11386/head:pull/11386 PR: https://git.openjdk.org/jdk/pull/11386 From kevinw at openjdk.org Mon Nov 28 11:04:53 2022 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 28 Nov 2022 11:04:53 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files In-Reply-To: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Message-ID: <bJCK-ZrbiLs8PuPKQA-n1gOftf0Iu32rtKk316gT85I=.fd2d79b5-e5bc-4d2e-81c8-005ab24fdbd6@github.com> On Mon, 28 Nov 2022 09:51:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. > > This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. Marked as reviewed by kevinw (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/11386 From duke at openjdk.org Mon Nov 28 11:34:49 2022 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 28 Nov 2022 11:34:49 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent In-Reply-To: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> Message-ID: <bnkgd8W17aWL-NEO-8y8CTjuDD143yPAllnqOoIDpLU=.06cfa8da-57d2-410b-9077-82ded32790d0@github.com> On Sat, 26 Nov 2022 22:16:34 GMT, Afshin Zafari <duke at openjdk.org> wrote: > The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. Problem description: Many BitMap operations take a range, specified via a pair of parameters. Some use the names "start" and "end", others use "beg" and "end", and the search functions use "l_index" and "r_index". It would be nice if these were consistent. "start" and "end" seem like a good choice. Patch: The pair of `start` and `end` used whenever a range to be defined/referred. Should I keep `beg` and `end` and change the fewer instances of `start`? ------------- PR: https://git.openjdk.org/jdk/pull/11375 From duke at openjdk.org Mon Nov 28 11:38:20 2022 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 28 Nov 2022 11:38:20 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> Message-ID: <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> On Mon, 28 Nov 2022 01:12:18 GMT, David Holmes <dholmes at openjdk.org> wrote: >> test of tier1-5 passed. > > src/hotspot/share/prims/jvmtiTagMap.cpp line 254: > >> 252: if (obj_tag != current_tag ) { >> 253: hashmap->remove(o); >> 254: hashmap->add(o, obj_tag); > > This change is not atomic - is that a problem? The concurrency aspects of using this map are not clear. add() is supposed to add a new object with tag. There is an assert() in its body that checks it. By removing the assert, add() can be used for updating as well. ------------- PR: https://git.openjdk.org/jdk/pull/11288 From fjiang at openjdk.org Mon Nov 28 11:39:53 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 28 Nov 2022 11:39:53 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection Message-ID: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, we decide to add detection for SATP mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. When sv57 is enabled, the output of java -version would be: root at qemuriscv64:~# jdk/bin/java -version Error occurred during initialization of VM Unsupported satp mode: 10 [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html Testing: - QEMU-system with sv48/sv57-enabled Linux image - HiFive Unmatched board (sv39) ------------- Commit messages: - Add detection of satp mode Changes: https://git.openjdk.org/jdk/pull/11388/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11388&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297697 Stats: 43 lines in 3 files changed: 42 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11388.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11388/head:pull/11388 PR: https://git.openjdk.org/jdk/pull/11388 From duke at openjdk.org Mon Nov 28 11:44:23 2022 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 28 Nov 2022 11:44:23 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> Message-ID: <WHjrI0S9ZDgqMCb9MAnk7v5trxY03bGQII0X3q9UUVI=.87213123-de73-4c99-ba39-2d6787d0badb@github.com> On Mon, 28 Nov 2022 01:20:49 GMT, David Holmes <dholmes at openjdk.org> wrote: >> test of tier1-5 passed. > > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 77: > >> 75: >> 76: void JvmtiTagMapTable::clear() { >> 77: struct RemoveAll{ > > Nit: space before { > > Stylistically I'm not sure we define local structs like this when defining a "closure" object. ?? I have seen these local structs in some places in the code (e.g., in concurrentHashTable.hpp), therefore I used it this way. Any preference? ------------- PR: https://git.openjdk.org/jdk/pull/11288 From duke at openjdk.org Mon Nov 28 11:50:35 2022 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 28 Nov 2022 11:50:35 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> Message-ID: <Q5SXU7qSQ0R8W48bq-p-mNuBKTtkkmD6XG8eL9O_5-8=.abb1cf1c-f169-43ec-936b-c5ee3873cade@github.com> On Mon, 28 Nov 2022 01:23:30 GMT, David Holmes <dholmes at openjdk.org> wrote: >> test of tier1-5 passed. > > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 95: > >> 93: //if (obj->fast_no_hash_check()) { >> 94: // return 0; >> 95: //} else { > > What are these comments? Coleen's suggestion for efficiency reasons. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 102: > >> 100: } >> 101: >> 102: bool JvmtiTagMapTable::add(oop obj, jlong tag) { > > I'm not seeing that a return value has any use here when it is always expected to be true. ResourceHashTable::put() returns true if the Key,Value is added, false if the Value is updated. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 123: > >> 121: >> 122: void JvmtiTagMapTable::remove_dead_entries(GrowableArray<jlong>* objects) { >> 123: struct IsDead{ > > Nit: space before { > > Same query about using a local struct for this. Alternative for struct? ------------- PR: https://git.openjdk.org/jdk/pull/11288 From aph at openjdk.org Mon Nov 28 11:51:35 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 28 Nov 2022 11:51:35 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks In-Reply-To: <Fwg13t4cqJUgK4rpn-89Z61X5HEhWWoyaHWVtApQlgQ=.1581e8d4-e0b4-40d5-afd4-98df7d9d57ae@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <nyPkIJiAmw69cI1CZAlmR2Km0-bVCxXuM3VZR6pVPUs=.b4515771-acf9-4653-ae6c-892d9f508b66@github.com> <j_nc7lnePGF3rMpnlsERrZjufWAWxTsahtZp1z13WQk=.521e0a62-9c75-426b-ac15-f6b63b4f69da@github.com> <Fwg13t4cqJUgK4rpn-89Z61X5HEhWWoyaHWVtApQlgQ=.1581e8d4-e0b4-40d5-afd4-98df7d9d57ae@github.com> Message-ID: <rke6K8n4uSzbZIBiU6B3yITnzyZU_yqJ1363FJFIUY8=.38403baa-d633-4ae7-ae59-9f946391b1b3@github.com> On Sun, 13 Nov 2022 20:52:17 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > > Drive-by comment: there is `os::pretouch_memory(void* start, void* end, size_t page_size)` ;) > > Good point. Had to cast the volatile away though. Be careful with that. On some OSes, touching more than N pages below the currently lowest-mapped stack page will segfault. Therefore you must touch from the top down. ------------- PR: https://git.openjdk.org/jdk/pull/10403 From eosterlund at openjdk.org Mon Nov 28 11:52:22 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 28 Nov 2022 11:52:22 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing [v2] In-Reply-To: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> Message-ID: <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> > There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. > Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11238/files - new: https://git.openjdk.org/jdk/pull/11238/files/6aabc24b..f59ead1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11238&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11238&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11238.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11238/head:pull/11238 PR: https://git.openjdk.org/jdk/pull/11238 From eosterlund at openjdk.org Mon Nov 28 11:52:23 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 28 Nov 2022 11:52:23 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing In-Reply-To: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> Message-ID: <vLsBazsZOWCQ9ftXPfYsDjlOmc0lu5ckk-r9a0-LWCw=.951c8666-a4fb-41ff-84f2-e15971c51244@github.com> On Fri, 18 Nov 2022 12:30:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. > Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. Thanks for the reviews, @dholmes-ora, @pchilano and @sspitsyn! I added a comment as requested. ------------- PR: https://git.openjdk.org/jdk/pull/11238 From ayang at openjdk.org Mon Nov 28 11:57:42 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 28 Nov 2022 11:57:42 GMT Subject: RFR: 8297499: Parallel: Missing iteration over klass when marking objArrays/objArrayOops during Full GC In-Reply-To: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> References: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Message-ID: <tQWhAcAxPQyMi5pK-p_fGhM0kewq_nA3iO_lJLbb410=.e1a71165-8a29-4b7b-88e4-700dc8e8bb9d@github.com> On Wed, 23 Nov 2022 12:55:55 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. > > Test: the revised test fails for Parallel without the fix Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/11321 From shade at openjdk.org Mon Nov 28 11:59:38 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Nov 2022 11:59:38 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks In-Reply-To: <rke6K8n4uSzbZIBiU6B3yITnzyZU_yqJ1363FJFIUY8=.38403baa-d633-4ae7-ae59-9f946391b1b3@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <nyPkIJiAmw69cI1CZAlmR2Km0-bVCxXuM3VZR6pVPUs=.b4515771-acf9-4653-ae6c-892d9f508b66@github.com> <j_nc7lnePGF3rMpnlsERrZjufWAWxTsahtZp1z13WQk=.521e0a62-9c75-426b-ac15-f6b63b4f69da@github.com> <Fwg13t4cqJUgK4rpn-89Z61X5HEhWWoyaHWVtApQlgQ=.1581e8d4-e0b4-40d5-afd4-98df7d9d57ae@github.com> <rke6K8n4uSzbZIBiU6B3yITnzyZU_yqJ1363FJFIUY8=.38403baa-d633-4ae7-ae59-9f946391b1b3@github.com> Message-ID: <x_yGlCu0KA3UrwhIN7aRRL3frOVuZepHzZIXKVy1FRc=.4572313a-d9b1-4916-93e2-4054e6f083fc@github.com> On Mon, 28 Nov 2022 11:49:07 GMT, Andrew Haley <aph at openjdk.org> wrote: > > > Drive-by comment: there is `os::pretouch_memory(void* start, void* end, size_t page_size)` ;) > > > > > > Good point. Had to cast the volatile away though. > > Be careful with that. On some OSes, touching more than N pages below the currently lowest-mapped stack page will segfault. Therefore you must touch from the top down. See `os::map_stack_shadow_pages()` for this. I think the pre-touching like this should be done as "extended" "preemptive" stack bang up to `stack_shadow_safe_limit`. `os::pretouch_memory` is fine for heap memory, but for thread stacks I think we need to hook into the usual stack overflow machinery. ------------- PR: https://git.openjdk.org/jdk/pull/10403 From thartmann at openjdk.org Mon Nov 28 12:01:43 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Nov 2022 12:01:43 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> Message-ID: <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> > `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. > > Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: > https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 > > Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed OrderAccess::release, added assert and adjusted ciReplay code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11316/files - new: https://git.openjdk.org/jdk/pull/11316/files/8c234464..7e94c2b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11316&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11316&range=00-01 Stats: 18 lines in 3 files changed: 1 ins; 12 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/11316.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11316/head:pull/11316 PR: https://git.openjdk.org/jdk/pull/11316 From ayang at openjdk.org Mon Nov 28 12:02:08 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 28 Nov 2022 12:02:08 GMT Subject: Integrated: 8297499: Parallel: Missing iteration over klass when marking objArrays/objArrayOops during Full GC In-Reply-To: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> References: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Message-ID: <IK6wSYLj8l31sP-PR47KHtmHTa0Uon2VGuTV4G4RsKg=.e16616ea-00c5-4693-ab87-40c74f6d35f6@github.com> On Wed, 23 Nov 2022 12:55:55 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. > > Test: the revised test fails for Parallel without the fix This pull request has now been integrated. Changeset: 6a856bc3 Author: Albert Mingkun Yang <ayang at openjdk.org> URL: https://git.openjdk.org/jdk/commit/6a856bc3f67d539f858904667ee86cbed54f94f7 Stats: 55 lines in 2 files changed: 43 ins; 3 del; 9 mod 8297499: Parallel: Missing iteration over klass when marking objArrays/objArrayOops during Full GC Co-authored-by: Stefan Johansson <sjohanss at openjdk.org> Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/11321 From thartmann at openjdk.org Mon Nov 28 12:08:47 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Nov 2022 12:08:47 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> Message-ID: <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> On Mon, 28 Nov 2022 12:01:43 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. >> >> Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 >> >> Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed OrderAccess::release, added assert and adjusted ciReplay code Thanks for the review, David! I added an assert to `MethodData::allocate` and fixed the ciReplay code which only ever gets executed by a single thread by removing the lock. I also removed the `OrderAccess::release()`. Regarding the overhead of going lock-free: I temporarily added code that counts the number of times that multiple threads attempt initialization and asserts if it's more that 50x. This triggers in 25 out of 203.772 runs (tests from tier1 to tier3). The average size of these allocations is around 88 words (704 bytes). I therefore think it's fine to avoid a lock in this case. Also, we already use a lock-free mechanism for `method::build_method_counters`. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/11316 From jvernee at openjdk.org Mon Nov 28 12:11:00 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 28 Nov 2022 12:11:00 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v31] In-Reply-To: <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> Message-ID: <y4Faezp2kScZfX03dUS2tr_xXBDBIBoRiAOdrjIbIFs=.0c07c6e6-4d60-4e20-93cd-20c5cfdee93d@github.com> On Wed, 23 Nov 2022 17:33:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues Latest version looks good to me as well ------------- Marked as reviewed by jvernee (Reviewer). PR: https://git.openjdk.org/jdk/pull/10872 From jvernee at openjdk.org Mon Nov 28 12:14:50 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 28 Nov 2022 12:14:50 GMT Subject: RFR: 8296477: Foreign linker implementation update following JEP 434 [v9] In-Reply-To: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> References: <CGd4JSefJvfEVkZEfORzthEIeV53kLk_UkZHAhJGrQ0=.7741b5f2-2227-4017-8164-d54fb9d30d10@github.com> Message-ID: <BUJkD2uAC18LKEIc_wAinv7-3gCa14Yk9n4SdWjP1ik=.160754de-67bc-46e3-80fc-9c8cf8b173e6@github.com> > Pull in linker implementation changes, that include non-trivial changes to VM code, from the panama-foreign repo into the main JDK. > > This is split off from the main JEP integration to make reviewing easier. > > This includes the following patches: > > 1. https://github.com/openjdk/panama-foreign/pull/698 > 2. https://github.com/openjdk/panama-foreign/pull/699 > 3. (part of) https://github.com/openjdk/panama-foreign/pull/731 > 4. https://github.com/openjdk/panama-foreign/pull/740 > 5. https://github.com/openjdk/panama-foreign/pull/746 > 6. https://github.com/openjdk/panama-foreign/pull/742 > 7. https://github.com/openjdk/panama-foreign/pull/743 > > Probably the biggest change to the code comes from replacing `VMReg` - which can not represent offsets into the stack that are not a multiple of the VM's stack slot size (32-bits) - with the new `VMStorage` class, which can describe byte offsets into the stack, as well as having a register mask to indicate only certain register segments. > > The only part of 3. that is in this PR is the part that turns the `VMStorage` class in Java into a record. > > Please refer to the PR of each individual patch for a more detailed description. Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: - use Arena in example - Merge branch 'PR_20' into VM_Changes - drop .inline from vmstorage header names - 8296973: saving errno on a value-returning function crashes the JVM Reviewed-by: mcimadamore - fix stubs - constexpr some functions - Review pt1 - Tweak copyright headers - Use @requires to disable some tests on x86 - Use AssertionError for internal exceptions - ... and 13 more: https://git.openjdk.org/jdk/compare/bbde3878...75917216 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11019/files - new: https://git.openjdk.org/jdk/pull/11019/files/03be64c9..75917216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11019&range=07-08 Stats: 16892 lines in 695 files changed: 7158 ins; 6214 del; 3520 mod Patch: https://git.openjdk.org/jdk/pull/11019.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11019/head:pull/11019 PR: https://git.openjdk.org/jdk/pull/11019 From eosterlund at openjdk.org Mon Nov 28 12:14:59 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 28 Nov 2022 12:14:59 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v5] In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <ZRDAWrDdtHTCHEanKDcQhADyr3tYrFkR4qqWh2P3fkE=.5f4bfbe8-2943-49a6-8711-79b0176a17ac@github.com> > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Patricio concerns ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11111/files - new: https://git.openjdk.org/jdk/pull/11111/files/3de25624..ad52bc7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=03-04 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Mon Nov 28 12:15:03 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 28 Nov 2022 12:15:03 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <nli1ZOZrAQ_vNgtG9Rvi516a99ClHUIblH8LCoNj9ag=.a67626f1-5cac-412c-b8c1-1aba29629348@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> <nli1ZOZrAQ_vNgtG9Rvi516a99ClHUIblH8LCoNj9ag=.a67626f1-5cac-412c-b8c1-1aba29629348@github.com> Message-ID: <N0BoHAwd4q98Fbpw76SLXL71sMxDPrRUpDG4_OLuvTo=.cf5d2277-86a0-4715-a34f-2c635b7e58a4@github.com> On Mon, 21 Nov 2022 12:17:02 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Richard comments > > I went through the changes and all looks good to me. Only minor comments. > > Thanks, > Patricio Thanks for the review @pchilano! I made the changes you requested. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1393: > >> 1391: // Guaranteed to be in young gen / newly allocated memory >> 1392: assert(!chunk->requires_barriers(), "Unfamiliar GC requires barriers on TLAB allocation"); >> 1393: _barriers = false; > > Do we need to explicitly set _barriers to false? It's already initialized to be false (same above for the UseZGC case). That would also allow to simplify the code a bit I think to be just an if statement that calls requires_barriers() for the "ZGC_ONLY(!UseZGC &&) (SHENANDOAHGC_ONLY(UseShenandoahGC ||) allocator.took_slow_path())" case, and then ZGC and the fast path could use just separate asserts outside conditionals. It's mainly there to improve readability at the moment. The simplification you have in mind applies well now, but unfortunately doesn't apply well for generational ZGC. And the main point of this PR is to prepare for generational ZGC integration. So I would prefer to leave it the way it is, if you are okay with that? ------------- PR: https://git.openjdk.org/jdk/pull/11111 From shade at openjdk.org Mon Nov 28 12:26:41 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Nov 2022 12:26:41 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods [v2] In-Reply-To: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> Message-ID: <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> > With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. > > We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Revert some additions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11359/files - new: https://git.openjdk.org/jdk/pull/11359/files/cfd86289..cde0c198 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11359&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11359&range=00-01 Stats: 13 lines in 3 files changed: 0 ins; 13 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11359.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11359/head:pull/11359 PR: https://git.openjdk.org/jdk/pull/11359 From shade at openjdk.org Mon Nov 28 12:26:42 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Nov 2022 12:26:42 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods [v2] In-Reply-To: <xskkJCXnjhdPFIjQ1TlA99E6RaCLON41SkXM4z75KnQ=.ec771a19-fab2-417c-bc6b-98da99c0a0a8@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> <xskkJCXnjhdPFIjQ1TlA99E6RaCLON41SkXM4z75KnQ=.ec771a19-fab2-417c-bc6b-98da99c0a0a8@github.com> Message-ID: <5_vN_U3k0tUArJD0XcWBn4_a7pFKxt-PmQT677--DcQ=.673f717d-ae71-460e-93fc-0036473330c9@github.com> On Sat, 26 Nov 2022 08:18:42 GMT, David Holmes <dholmes at openjdk.org> wrote: > Unclear for many JVMCI functions that the thread argument is actually intended/required to be the current thread. It seems unused in many cases so why is it passed? Yes, I agree the initial patch over-reached in some places. Please see new commit, which reduces it. I left `thread == JavaThread::current()` checks where I can argue the threads are expected to be current. ------------- PR: https://git.openjdk.org/jdk/pull/11359 From kbarrett at openjdk.org Mon Nov 28 12:31:11 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 28 Nov 2022 12:31:11 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <pPy5eCgVCXREucMdNtbj_xx_h7oeZOhyYfh23RTMa8M=.7b59b2b3-8ca3-4c06-8824-a9ca661e2a32@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> <pPy5eCgVCXREucMdNtbj_xx_h7oeZOhyYfh23RTMa8M=.7b59b2b3-8ca3-4c06-8824-a9ca661e2a32@github.com> Message-ID: <OjOfTZGOPyTc516trNb762BQj3jTtL0mJBfcZP_LO9Q=.dda9d596-69af-4ae5-90b2-de56faf4dae9@github.com> On Sun, 27 Nov 2022 13:49:41 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Rectify issues mentioned in review Changes requested by kbarrett (Reviewer). doc/hotspot-style.md line 657: > 655: > 656: `alignas` > 657: ([n1877](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1877.pdf)) n1877 is not the final version of the proposal; see [n2341](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2341.pdf) doc/hotspot-style.md line 667: > 665: within HotSpot, at least until we switch to using C++17. See the review for > 666: [JDK-8252584](https://github.com/openjdk/jdk/pull/11315) > 667: for more information. There have been a couple of relevant defect reports: CWG 2354 - https://cplusplus.github.io/CWG/issues/2354.html CWG 1437 - https://cplusplus.github.io/CWG/issues/1437.html It turns out alignas must be applied to a definition, with declarations optionally having alignas equivalent to the definition: C++14 10.6.2/6 - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf. I found the structure of the proposed text somewhat difficult to follow. Limits on the maximum alignment need to be explored and documented. The maximum for an automatic variable might be significantly limited, or maybe just risks blowing the stack. I think we don't have any existing uses of aligned automatic variables, so maybe we don't need to permit them? For now I'll leave that in place, but I could easily be convinced to remove it. So I suggest something like the following: https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:alignas?expand=1 ------------- PR: https://git.openjdk.org/jdk/pull/11315 From stuefe at openjdk.org Mon Nov 28 12:45:08 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Nov 2022 12:45:08 GMT Subject: RFR: JDK-8297660: x86: Redundant test+jump in C1 allocateArray [v3] In-Reply-To: <YjFViZEurRxai1pihV4FK8Nd7r0hWm38VOasPhmuz8g=.d8aec25c-e306-40ad-8d74-c6f99ac321c2@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> <YjFViZEurRxai1pihV4FK8Nd7r0hWm38VOasPhmuz8g=.d8aec25c-e306-40ad-8d74-c6f99ac321c2@github.com> Message-ID: <UCl2CuE7semDKNINd6koJe9rUHaF9otRepbW7IwFFb4=.0cfc37e3-6a00-427a-b8db-2eceaa902a4e@github.com> On Mon, 28 Nov 2022 07:37:35 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: >> In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: >> >> >> 13442 0x00007f58a8ca2f02: sub $0x10,%rsi >> 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << >> 13444 0x00007f58a8ca2f0c: test %rsi,%rsi >> 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << >> 13446 0x00007f58a8ca2f15: xor %rbx,%rbx >> 13447 0x00007f58a8ca2f18: shr $0x3,%rsi >> 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) >> 13449 0x00007f58a8ca2f21: dec %rsi >> 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} >> 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) >> >> >> Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. >> >> --- >> >> Patch removes one test+jump and adds an assertion for len>0 to zero_memory. >> >> Patch ran through SAP nightlies and GHAs. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into double-test-allocate-array > - remove outer conditional jump > - Revert "remove-redundant-test-from-zeromemory" > > This reverts commit 4f95969d4d3026ce2310230c37f469579dc32e88. > - remove-redundant-test-from-zeromemory x86 error for `jshell/Test8294583.java` unrelated. Thanks, @y1yang0 and @theRealAph for reviewing. ------------- PR: https://git.openjdk.org/jdk/pull/11372 From stuefe at openjdk.org Mon Nov 28 12:49:45 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Nov 2022 12:49:45 GMT Subject: Integrated: JDK-8297660: x86: Redundant test+jump in C1 allocateArray In-Reply-To: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> References: <Q35BpB6opx9NitxeTr4cTeySMztmfHe0SgAeWBx1qOI=.b65af699-4069-4c3a-81e5-5e7254ce4b98@github.com> Message-ID: <a9EpA16xJmD19VATShJEq0m93pL6l7_RKa2k-7wqt_8=.4d99d0e1-ba58-44ce-b5cd-3e3e20cf5604@github.com> On Fri, 25 Nov 2022 18:24:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > In `C1_MacroAssembler::initialize_body()` we test the input array length for 0. We do this again in `MacroAssembler::zero_memory()`. This results in a redundant test+jump instruction: > > > 13442 0x00007f58a8ca2f02: sub $0x10,%rsi > 13443 0x00007f58a8ca2f06: je 0x00007f58a8ca2f26 << > 13444 0x00007f58a8ca2f0c: test %rsi,%rsi > 13445 0x00007f58a8ca2f0f: je 0x00007f58a8ca2f26 << > 13446 0x00007f58a8ca2f15: xor %rbx,%rbx > 13447 0x00007f58a8ca2f18: shr $0x3,%rsi > 13448 0x00007f58a8ca2f1c: mov %rbx,0x8(%rax,%rsi,8) > 13449 0x00007f58a8ca2f21: dec %rsi > 13450 0x00007f58a8ca2f24: jne 0x00007f58a8ca2f1c ;*anewarray {reexecute=0 rethrow=0 return_oop=0} > 13451 ; - java.lang.invoke.MethodHandles::<clinit>@24 (line 5109) > > > Since `MacroAssembler::zero_memory()` is only ever called from `C1_MacroAssembler::initialize_body()`, it does not need to test for len=0, since its caller already does. > > --- > > Patch removes one test+jump and adds an assertion for len>0 to zero_memory. > > Patch ran through SAP nightlies and GHAs. This pull request has now been integrated. Changeset: c05dc802 Author: Thomas Stuefe <stuefe at openjdk.org> URL: https://git.openjdk.org/jdk/commit/c05dc80234a6beff3fa4d2de3228928c639da083 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8297660: x86: Redundant test+jump in C1 allocateArray Reviewed-by: aph, yyang ------------- PR: https://git.openjdk.org/jdk/pull/11372 From mdoerr at openjdk.org Mon Nov 28 13:11:13 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Nov 2022 13:11:13 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v5] In-Reply-To: <ZRDAWrDdtHTCHEanKDcQhADyr3tYrFkR4qqWh2P3fkE=.5f4bfbe8-2943-49a6-8711-79b0176a17ac@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <ZRDAWrDdtHTCHEanKDcQhADyr3tYrFkR4qqWh2P3fkE=.5f4bfbe8-2943-49a6-8711-79b0176a17ac@github.com> Message-ID: <CnrrwepIhJXFUIs0e54nE048zsDUq-wqjJu_6RXB0No=.bb18857e-b992-4290-aa51-3f98e57d0f0f@github.com> On Mon, 28 Nov 2022 12:14:59 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Patricio concerns I think PPC64 needs the change, too, now: https://github.com/openjdk/jdk/blob/c05dc80234a6beff3fa4d2de3228928c639da083/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp#L1660 ------------- PR: https://git.openjdk.org/jdk/pull/11111 From rrich at openjdk.org Mon Nov 28 14:17:35 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Nov 2022 14:17:35 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> Message-ID: <DB9BaZDNMn-V2tsyr5iD90cRHC0AHgEbHedkv8AlH40=.82fc6c0b-f63e-4283-841b-fb7bb463deaf@github.com> On Mon, 28 Nov 2022 12:01:43 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. >> >> Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 >> >> Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed OrderAccess::release, added assert and adjusted ciReplay code Thanks for taking care of the issue. The fix looks ok to me. Seems like you've removed all uses of `MethodData_lock` so you could remove the declaration and definition too. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/11316 From coleenp at openjdk.org Mon Nov 28 14:21:48 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Nov 2022 14:21:48 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability [v2] In-Reply-To: <qZ1KS_MbDBbazQdi8qQNeaFqgzCFGcNEGH5wlRvYFZk=.e3900cf8-1acb-41bb-9126-ede3954cdb10@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <qZ1KS_MbDBbazQdi8qQNeaFqgzCFGcNEGH5wlRvYFZk=.e3900cf8-1acb-41bb-9126-ede3954cdb10@github.com> Message-ID: <FpMmUVaBChQYc-v4Nc_wTEgp4cuPXcnLCGvLGTP4JIs=.992c8553-df35-449d-b3fd-d132e70b1fc4@github.com> On Thu, 17 Nov 2022 10:29:34 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Refactor the STEP macro in VMError::report to improve readability. >> Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. >> >> This enhancement aims to do two things: >> 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. >> 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro >> >> Testing: tier 1 + GHA > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Follow HotSpot code style: no implicit boolean > - Respect 100 character line > - Revert extended test I apologize for the latency in reviewing this. It looks really nice. Thank you. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/11018 From thartmann at openjdk.org Mon Nov 28 14:33:39 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Nov 2022 14:33:39 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v3] In-Reply-To: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> Message-ID: <uNoa_1yi6xIqGrMPQbNTIJBoU9kDYFueWhx4oyMfXTU=.48904340-c641-40b7-a225-4b186ea1bb32@github.com> > `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. > > Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: > https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 > > Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Completely removed MethodData_lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11316/files - new: https://git.openjdk.org/jdk/pull/11316/files/7e94c2b2..4b81ea4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11316&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11316&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11316.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11316/head:pull/11316 PR: https://git.openjdk.org/jdk/pull/11316 From thartmann at openjdk.org Mon Nov 28 14:33:42 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Nov 2022 14:33:42 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> Message-ID: <dTXoILbV4ZY-3dQZ3sAJRccuyvPDpLkzZeD6qisXhTU=.324e3298-d58e-42ca-a508-df06407dcb95@github.com> On Mon, 28 Nov 2022 12:01:43 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. >> >> Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 >> >> Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed OrderAccess::release, added assert and adjusted ciReplay code Thanks for the review, Richard. Good catch, I removed the `MethodData_lock`. ------------- PR: https://git.openjdk.org/jdk/pull/11316 From pchilanomate at openjdk.org Mon Nov 28 15:18:10 2022 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Nov 2022 15:18:10 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v4] In-Reply-To: <nli1ZOZrAQ_vNgtG9Rvi516a99ClHUIblH8LCoNj9ag=.a67626f1-5cac-412c-b8c1-1aba29629348@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <dgIMUPeDO7sqZR_QCaPi6JpoZJKCNJ9-QoN97QuZ-y8=.6bf9ec32-8a59-4646-a021-d98ff53f36c4@github.com> <nli1ZOZrAQ_vNgtG9Rvi516a99ClHUIblH8LCoNj9ag=.a67626f1-5cac-412c-b8c1-1aba29629348@github.com> Message-ID: <RrYt-Y3AFrS5E0p_nGzs0Pegqu6eAktTNQOEZdIpz_Y=.08d74052-1ead-4bef-b052-f22370ceb69f@github.com> On Mon, 21 Nov 2022 12:17:02 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Richard comments > > I went through the changes and all looks good to me. Only minor comments. > > Thanks, > Patricio > Thanks for the review @pchilano! I made the changes you requested. > Looks good, thanks Erik! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Mon Nov 28 15:49:30 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 28 Nov 2022 15:49:30 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v6] In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <_P0NXex3w0yz8V-4FXZdTKT4Jt_eskqOYRykIoWqVrI=.4553b958-0425-460c-be29-a1ce884d0f27@github.com> > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - PPC support - Merge branch 'master' into 8296875_refactor_loom_code - Patricio concerns - Fix Richard comments - Indentation fix - Fix verification and RISC-V support - Generational ZGC: Loom support ------------- Changes: https://git.openjdk.org/jdk/pull/11111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11111&range=05 Stats: 978 lines in 42 files changed: 641 ins; 228 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/11111.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11111/head:pull/11111 PR: https://git.openjdk.org/jdk/pull/11111 From mdoerr at openjdk.org Mon Nov 28 15:49:32 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Nov 2022 15:49:32 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v5] In-Reply-To: <ZRDAWrDdtHTCHEanKDcQhADyr3tYrFkR4qqWh2P3fkE=.5f4bfbe8-2943-49a6-8711-79b0176a17ac@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <ZRDAWrDdtHTCHEanKDcQhADyr3tYrFkR4qqWh2P3fkE=.5f4bfbe8-2943-49a6-8711-79b0176a17ac@github.com> Message-ID: <QaNW-3pnwMZXDqwgM_9iMe3YeFhbN6zzb9rTTGboaCk=.0aa25ba2-a612-45fa-b2fc-30fe712ebebf@github.com> On Mon, 28 Nov 2022 12:14:59 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Patricio concerns Thanks for the update! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From stuefe at openjdk.org Mon Nov 28 16:08:00 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Nov 2022 16:08:00 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks In-Reply-To: <x_yGlCu0KA3UrwhIN7aRRL3frOVuZepHzZIXKVy1FRc=.4572313a-d9b1-4916-93e2-4054e6f083fc@github.com> References: <pbqYDWsnkM5XdmZE6QK7wV7dNuQOCjdyPzw5OlQkSxo=.4ea19fb5-d031-42f7-850c-ae42399e0e80@github.com> <nyPkIJiAmw69cI1CZAlmR2Km0-bVCxXuM3VZR6pVPUs=.b4515771-acf9-4653-ae6c-892d9f508b66@github.com> <j_nc7lnePGF3rMpnlsERrZjufWAWxTsahtZp1z13WQk=.521e0a62-9c75-426b-ac15-f6b63b4f69da@github.com> <Fwg13t4cqJUgK4rpn-89Z61X5HEhWWoyaHWVtApQlgQ=.1581e8d4-e0b4-40d5-afd4-98df7d9d57ae@github.com> <rke6K8n4uSzbZIBiU6B3yITnzyZU_yqJ1363FJFIUY8=.38403baa-d633-4ae7-ae59-9f946391b1b3@github.com> <x_yGlCu0KA3UrwhIN7aRRL3frOVuZepHzZIXKVy1FRc=.4572313a-d9b1-4916-93e2-4054e6f083fc@github.com> Message-ID: <PqnzHpMQvvLS_1LISPyEpUIit36jqQ_JXmGOj_PsvbU=.658235ab-5ee5-43dc-ac40-43af26724571@github.com> On Mon, 28 Nov 2022 11:57:24 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > > Be careful with that. On some OSes, touching more than N pages below the currently lowest-mapped stack page will segfault. Therefore you must touch from the top down. > > See `os::map_stack_shadow_pages()` for this. > > I think the pre-touching like this should be done as "extended" "preemptive" stack bang up to `stack_shadow_safe_limit`. `os::pretouch_memory` is fine for heap memory, but for thread stacks I think we need to hook into the usual stack overflow machinery. Oh sure, I can touch in reverse order. But I wonder if that is really needed. I used alloca() to reserve the stack space, and it should take care of OS-side stack banging for me. On Windows it does. So in theory it should be functionally equivalent to `os::map_stack_shadow_pages()`. ------------- PR: https://git.openjdk.org/jdk/pull/10403 From pminborg at openjdk.org Mon Nov 28 16:44:47 2022 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 28 Nov 2022 16:44:47 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v31] In-Reply-To: <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> Message-ID: <XCIXVImxWrXeXL2HyVJa4A0nrgCtpVEmEv6hX1eQSzw=.3a4ec399-5384-4849-90ac-0a8b5c48e658@github.com> On Wed, 23 Nov 2022 17:33:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues Looks good on API level. ------------- Marked as reviewed by pminborg (no project role). PR: https://git.openjdk.org/jdk/pull/10872 From ayang at openjdk.org Mon Nov 28 16:48:49 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 28 Nov 2022 16:48:49 GMT Subject: RFR: 8296954: G1: Enable parallel scanning for heap region remset [v2] In-Reply-To: <Z3KRru96hq1olxNqjZ4egmsFUE2CbPvFtK9LiseIUak=.c9a51504-8541-4be8-8528-0258fd99cce5@github.com> References: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> <Z3KRru96hq1olxNqjZ4egmsFUE2CbPvFtK9LiseIUak=.c9a51504-8541-4be8-8528-0258fd99cce5@github.com> Message-ID: <1lr44acj8gpgxiEzIxa2nvkdZSJrL1hF9ZAcCOfqhr8=.f672caf1-b4b6-4127-8637-d40819938552@github.com> On Mon, 28 Nov 2022 08:34:37 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote: >> Hi all, >> >> Please review this change that allows parallel scanning of a heap region's remembered set. More balanced work load distribution in cases where are cards are unevenly distributed among remembered sets. >> >> Testing: Tier 1-3 >> >> Thanks > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review I believe some performance results (benchmark setup, what metric is improved, etc) should be posted on the corresponding JBS ticket to better motivate this feature. The change looks fine. src/hotspot/share/gc/g1/g1CardSet.cpp line 246: > 244: using CHTScanTask = CardSetHash::ScanTask; > 245: > 246: const static uint BucketClaimSize = 16; Some comment on how `16` is derived would be nice and probably useful for future re-evaluation. ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11173 From psandoz at openjdk.org Mon Nov 28 18:01:42 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 28 Nov 2022 18:01:42 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v25] In-Reply-To: <UfQPCmn8ML0YG1uxizsdAleLFUW2USTng_I8Fh_3_Vw=.453ddf9c-14b6-48f1-98cd-c40e45e2e885@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <9ZhPphzsMfl1vMcmbOjnzXi1THKCagn-RIE5TAE3a0M=.0e019155-9ea9-4e6c-ab03-a38cf7a7de33@github.com> <Lto2AKQzVhmhlJeEGUVReOe76YWav8zFweBP2Jq1JNA=.3589ae3a-eeab-436f-bf2f-54c71f29f4a6@github.com> <UfQPCmn8ML0YG1uxizsdAleLFUW2USTng_I8Fh_3_Vw=.453ddf9c-14b6-48f1-98cd-c40e45e2e885@github.com> Message-ID: <d3rC1nIN_wTS-Xv69fYBpHIlLHaeb97Twk4wfItFfDQ=.556f63ea-c5ba-414b-bcaa-57412f86933f@github.com> On Thu, 24 Nov 2022 09:27:04 GMT, Andrew Haley <aph at openjdk.org> wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 209: >> >>> 207: final int bitmask; >>> 208: >>> 209: private static final Object NIL = new Object(); >> >> Suggestion: >> >> static final Object NO_VALUE = new Object(); > > It not very important, but I'm going to push back (very gently) on this one. "nil: noun. nothing; naught; zero. adjective. having no value or existence." That is the exact literal meaning of this sentinel. Also, "nil" has been used with this meaning in programming languages for 60 years. What is your objection to it here? I agree its not very important, please feel free to ignore my suggestion. My thinking was to prefer something more explicit in the code when reading, since i felt the use of the term nil was more idiomatic in other languages than Java. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From psandoz at openjdk.org Mon Nov 28 18:30:46 2022 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 28 Nov 2022 18:30:46 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v31] In-Reply-To: <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <S9AFA1STby7c240-cSlvO-e0MekMt-uKHFAqUbOnoOU=.fdab60c5-eeb5-4f5e-b189-409225b7500f@github.com> Message-ID: <FKBN82tL18C-Krv3xHtlfkgomts1em20E1gFc2WKqy8=.e3570b94-7481-4742-a852-9405eb269b43@github.com> On Wed, 23 Nov 2022 17:33:06 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues Marked as reviewed by psandoz (Reviewer). src/java.base/share/classes/jdk/internal/foreign/FunctionDescriptorImpl.java line 57: > 55: * {@return the return layout (if any) associated with this function descriptor} > 56: */ > 57: public final Optional<MemoryLayout> returnLayout() { No need for `final` since class is final. Suggestion: public Optional<MemoryLayout> returnLayout() { src/java.base/share/classes/jdk/internal/foreign/SlicingAllocator.java line 33: > 31: public final class SlicingAllocator implements SegmentAllocator { > 32: > 33: public static final long DEFAULT_BLOCK_SIZE = 4 * 1024; Not used. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From svkamath at openjdk.org Mon Nov 28 18:57:35 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 28 Nov 2022 18:57:35 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" [v3] In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <dZCfgJ0XlK0fFeZ1vdHXu695py9s7jiTrB7Tp_Pli7Q=.3262e663-b6fc-4560-942d-3e14f127f552@github.com> > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Addressed review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11301/files - new: https://git.openjdk.org/jdk/pull/11301/files/5af25e9b..e432bf7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11301&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11301&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11301.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11301/head:pull/11301 PR: https://git.openjdk.org/jdk/pull/11301 From sviswanathan at openjdk.org Mon Nov 28 19:11:55 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 28 Nov 2022 19:11:55 GMT Subject: RFR: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" In-Reply-To: <hxEaxA2hjYIXr0vfIcVfyAcfoK9RYpVH8URLeVkElSk=.0ca3b358-29f1-49ae-9f9e-e76a053280d5@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> <hxEaxA2hjYIXr0vfIcVfyAcfoK9RYpVH8URLeVkElSk=.0ca3b358-29f1-49ae-9f9e-e76a053280d5@github.com> Message-ID: <JRxSDm3SKuIBG3zf4xClDoSolgx9NXJ4HIG2Fr43JNM=.b4077467-280b-42c0-b43e-8d04d3ae6030@github.com> On Tue, 22 Nov 2022 22:05:34 GMT, Smita Kamath <svkamath at openjdk.org> wrote: >> 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" > > Hi All, > > I have updated f2hf and hf2f methods in sharedRuntime.cpp as a fix for the error unexpected result of converting. Kindly review this patch and provide feedback. Thank you. > > Regards, > Smita The whitespace related change looks good. @smita-kamath Please go ahead and integrate. ------------- PR: https://git.openjdk.org/jdk/pull/11301 From svkamath at openjdk.org Mon Nov 28 19:28:37 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 28 Nov 2022 19:28:37 GMT Subject: Integrated: 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" In-Reply-To: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> References: <N-uw96v8RJP528ABLHfs4Fwrber9INPk7W8S4RXQR1I=.66690985-07b8-4156-b25e-3ada0576cdff@github.com> Message-ID: <IGlTPrFUNZvVXuF1ODbuduE4aG2CS01n2vqyczUQK7M=.90860b69-4318-4411-b7be-be0e1dc80a77@github.com> On Tue, 22 Nov 2022 21:52:59 GMT, Smita Kamath <svkamath at openjdk.org> wrote: > 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" This pull request has now been integrated. Changeset: 105d9d75 Author: Smita Kamath <svkamath at openjdk.org> Committer: Sandhya Viswanathan <sviswanathan at openjdk.org> URL: https://git.openjdk.org/jdk/commit/105d9d75e84a46400f52fafda2ea00c99c14eaf0 Stats: 20 lines in 2 files changed: 12 ins; 1 del; 7 mod 8295351: java/lang/Float/Binary16Conversion.java fails with "Unexpected result of converting" Reviewed-by: sviswanathan, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/11301 From mcimadamore at openjdk.org Mon Nov 28 19:29:08 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 28 Nov 2022 19:29:08 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v32] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <cJ48CQwDtj894fCv2OVZb0czHdZeP0onHPA8KDIEyjg=.1c2bf974-6039-4cf4-894c-1329f173efbc@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/97168155..6699ad99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=30-31 Stats: 8 lines in 2 files changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From iklam at openjdk.org Mon Nov 28 21:53:03 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 28 Nov 2022 21:53:03 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> Message-ID: <pvMAGHYwX9rhIhSECelm_TA1OfqE2rv7HUB7E26-VmI=.8b9ea8d1-52cf-4364-aa06-b08bcf97d7a4@github.com> On Thu, 24 Nov 2022 00:50:22 GMT, David Holmes <dholmes at openjdk.org> wrote: >> This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. >> >> One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). >> >> This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix formatting LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Mon Nov 28 22:02:00 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 22:02:00 GMT Subject: RFR: 8286185: The Java manpage can be more platform inclusive [v3] In-Reply-To: <pvMAGHYwX9rhIhSECelm_TA1OfqE2rv7HUB7E26-VmI=.8b9ea8d1-52cf-4364-aa06-b08bcf97d7a4@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> <gs7MICgnmPqU3QYwn9XYjduMdxuyIW_mY7hWu5gBqTA=.74135117-30da-4bc4-a3d2-c06d7bf1dd8a@github.com> <pvMAGHYwX9rhIhSECelm_TA1OfqE2rv7HUB7E26-VmI=.8b9ea8d1-52cf-4364-aa06-b08bcf97d7a4@github.com> Message-ID: <WpoWvSGQyQer4OpPWAVM1N5ojmWe_UVxDmtgIIA7kR0=.114f9d86-a21d-4f51-b7be-bc20820b3727@github.com> On Mon, 28 Nov 2022 21:48:40 GMT, Ioi Lam <iklam at openjdk.org> wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix formatting > > LGTM Thanks @iklam ! ------------- PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Mon Nov 28 22:06:55 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 22:06:55 GMT Subject: Integrated: 8286185: The Java manpage can be more platform inclusive In-Reply-To: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> References: <yiCX947lHQ3SDVQ5Iz_VWZaSYwlfqoYmKwM-hQN_7OQ=.2a335dc2-23b9-46b6-a7ac-d9a80e491fc1@github.com> Message-ID: <xntq-tardjmMrM46Ato6KHjeAP73LkG_-o3-ARLL-3g=.cb27abff-373a-4730-91c3-22d6a7dd7540@github.com> On Thu, 24 Nov 2022 00:24:00 GMT, David Holmes <dholmes at openjdk.org> wrote: > This is mainly an expansion of the included platforms by changing "linux and macOS" to "Non-Windows". There are a few additional examples, and clarification that they are just examples. There are also some minor edits and corrections I spotted. > > One actual fix relates to the "control-break" -> "control-" change. I can factor that out if needed (or just add an additional issue to the PR). > > This doesn't attempt to give complete platform recognition for all OpenJDK platforms. Two areas where anyone interested could file a further RFE is the support of DTrace on BSD systems other than macOS; and the use of RTM locking on Power8 architecture (existing documentation is all about Intel TSX on x86). > > Thanks. This pull request has now been integrated. Changeset: 05128c21 Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/05128c2110e1d64111a30d641898ed94925243d6 Stats: 57 lines in 1 file changed: 19 ins; 3 del; 35 mod 8286185: The Java manpage can be more platform inclusive Reviewed-by: sspitsyn, kvn, iklam ------------- PR: https://git.openjdk.org/jdk/pull/11340 From dholmes at openjdk.org Mon Nov 28 22:09:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 22:09:05 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent In-Reply-To: <bnkgd8W17aWL-NEO-8y8CTjuDD143yPAllnqOoIDpLU=.06cfa8da-57d2-410b-9077-82ded32790d0@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> <bnkgd8W17aWL-NEO-8y8CTjuDD143yPAllnqOoIDpLU=.06cfa8da-57d2-410b-9077-82ded32790d0@github.com> Message-ID: <D4xNLkkfZlnd6H5nnSxgrdo0Jlj5qZ0GwcaGzZ6JT6Y=.e7148802-65a3-454d-b63f-3fa12ab7fa00@github.com> On Mon, 28 Nov 2022 11:32:13 GMT, Afshin Zafari <duke at openjdk.org> wrote: > Should I keep beg and end and change the fewer instances of start? That would be the more minimal fix. As I said I only see "start" used in one method as `start_offset`. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11375 From dholmes at openjdk.org Mon Nov 28 22:20:08 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Nov 2022 22:20:08 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> Message-ID: <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> On Mon, 28 Nov 2022 11:35:48 GMT, Afshin Zafari <duke at openjdk.org> wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 254: >> >>> 252: if (obj_tag != current_tag ) { >>> 253: hashmap->remove(o); >>> 254: hashmap->add(o, obj_tag); >> >> This change is not atomic - is that a problem? The concurrency aspects of using this map are not clear. > > add() is supposed to add a new object with tag. There is an assert() in its body that checks it. By removing the assert, add() can be used for updating as well. Are you suggesting that `add` can also act as a `replace` operation? I would think we would want a separate method for that. >> src/hotspot/share/prims/jvmtiTagMapTable.cpp line 95: >> >>> 93: //if (obj->fast_no_hash_check()) { >>> 94: // return 0; >>> 95: //} else { >> >> What are these comments? > > Coleen's suggestion for efficiency reasons. If the comment is meant to suggest some possible future optimisation it should say that. >> src/hotspot/share/prims/jvmtiTagMapTable.cpp line 102: >> >>> 100: } >>> 101: >>> 102: bool JvmtiTagMapTable::add(oop obj, jlong tag) { >> >> I'm not seeing that a return value has any use here when it is always expected to be true. > > ResourceHashTable::put() returns true if the Key,Value is added, false if the Value is updated. But this doesn't do that, so ?? >> src/hotspot/share/prims/jvmtiTagMapTable.cpp line 123: >> >>> 121: >>> 122: void JvmtiTagMapTable::remove_dead_entries(GrowableArray<jlong>* objects) { >>> 123: struct IsDead{ >> >> Nit: space before { >> >> Same query about using a local struct for this. > > Alternative for struct? Not sure ... didn't we start using C++ lambda's for some of these "closure" operations? @coleenp what is the usual pattern we use for this kind of thing? ------------- PR: https://git.openjdk.org/jdk/pull/11288 From luhenry at openjdk.org Mon Nov 28 22:33:40 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 28 Nov 2022 22:33:40 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection In-Reply-To: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> Message-ID: <n9kbmUxMNp6tTH0U_MT8mPajw7lB6KZRuPscNXMngPo=.8e74b174-a8df-49f4-8893-cc9d5b1502dd@github.com> On Mon, 28 Nov 2022 11:31:17 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: > RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. > As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, > we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. > > When sv57 is enabled, the output of java -version would be: > > > root at qemuriscv64:~# jdk/bin/java -version > Error occurred during initialization of VM > Unsupported satp mode: 10 > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html > > Testing: > > - QEMU-system with sv48/sv57-enabled Linux image `-version` test > - HiFive Unmatched board (sv39) `-version` test src/hotspot/cpu/riscv/vm_version_riscv.cpp line 43: > 41: VM_MODE mode = get_satp_mode(); > 42: if (mode > RISCV64_ONLY(VM_SV48) RISCV32_ONLY(VM_SV32)) { > 43: vm_exit_during_initialization(err_msg("Unsupported satp mode: %d", mode)); I would map that number to a string. For example 10 to ?sv48? so it is more googlable and searchable. ------------- PR: https://git.openjdk.org/jdk/pull/11388 From amenkov at openjdk.org Mon Nov 28 23:38:12 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 28 Nov 2022 23:38:12 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests Message-ID: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> The fix combines almost the same tests to 1 test to remove code duplication ------------- Commit messages: - Combined nsk resetPeakThreadCount tests Changes: https://git.openjdk.org/jdk/pull/11400/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11400&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297742 Stats: 289 lines in 7 files changed: 42 ins; 245 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11400.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11400/head:pull/11400 PR: https://git.openjdk.org/jdk/pull/11400 From ascarpino at openjdk.org Tue Nov 29 00:32:36 2022 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Tue, 29 Nov 2022 00:32:36 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v5] In-Reply-To: <7-omMJqslZjYB-nYcKyasj5bkNSDyCTXyj53yg7hpPI=.d9b796da-cf58-4591-8767-72beb67faf1e@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <7-omMJqslZjYB-nYcKyasj5bkNSDyCTXyj53yg7hpPI=.d9b796da-cf58-4591-8767-72beb67faf1e@github.com> Message-ID: <Z66Np6KrK2x4QYpLcF8ytj1_jRHZgV3P01xxPiGctOY=.7d89ea7f-77ad-4aaf-8ea4-e5bba0bf183b@github.com> On Tue, 22 Nov 2022 05:28:05 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: > > - Merge with main > - Add AVX assertion guard > - Pull out common macro code into function parameter pack > - replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations > - Change intrinsic helper method name conform to convention > - consolidate chacha macroAssembler routines into chacha stubGenerator file > - More indentation fixes on aarch64 > - rename chapoly->chacha for macro file > - rename chacha macro file to be consistent with x86_64 naming > - Fix indentation issues > - ... and 40 more: https://git.openjdk.org/jdk/compare/392ac705...bb3f4264 src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 92: > 90: private long counter; > 91: > 92: // The 16-int state array and output keystream array: I think it would help readability if these comments were separated for each declaration ------------- PR: https://git.openjdk.org/jdk/pull/7702 From jwaters at openjdk.org Tue Nov 29 00:54:53 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Nov 2022 00:54:53 GMT Subject: Withdrawn: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> References: <wtpB01HooEIU2FxGS9E5KPIv1ogztEw-XYFRTUNw5Cw=.6e6625ee-5343-49d6-98f3-2962f269184d@github.com> Message-ID: <18jTqbTmLk2rTyF5ClMHLYZ81VVfDVyYy9Ia3KLT-jk=.a2e930d9-de12-48be-a5d0-56976c3163f8@github.com> On Wed, 23 Nov 2022 10:24:42 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Add alignas to the permitted features set. Though the corresponding entry mentions this should not be done for classes, there's no actual difference in practice with all our supported compilers, because their nonstandard syntax also has the same limitations and issues with dynamic allocation as the C++ alignas, and including such a restriction of falling back to ATTRIBUTE_ALIGNED in the case of classes in the style guide would ultimately not really serve much of a point This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11315 From dlong at openjdk.org Tue Nov 29 01:05:17 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Nov 2022 01:05:17 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> Message-ID: <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> On Mon, 28 Nov 2022 12:06:03 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > I added an assert to `MethodData::allocate` and fixed the ciReplay code which only ever gets executed by a single thread by removing the lock. I also removed the `OrderAccess::release()`. I'm not sure if it's guaranteed that replay runs single-threaded. I haven't looked into the details, but I think ReplayInline can run in any compiler thread after startup, and then of course there is JDK-8254110. ------------- PR: https://git.openjdk.org/jdk/pull/11316 From jwaters at openjdk.org Tue Nov 29 01:13:39 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Nov 2022 01:13:39 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas Message-ID: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) ------------- Commit messages: - HotSpot Style Guide changes Changes: https://git.openjdk.org/jdk/pull/11404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11404&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8252584 Stats: 94 lines in 2 files changed: 92 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11404.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11404/head:pull/11404 PR: https://git.openjdk.org/jdk/pull/11404 From lmesnik at openjdk.org Tue Nov 29 01:15:33 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 29 Nov 2022 01:15:33 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests In-Reply-To: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> Message-ID: <uEuL8nGO4RPRsfLTVrG5hexvCwZ_teK0pTilA5DwhS4=.a7719c57-643a-4e03-8f25-bdd3c61badc6@github.com> On Mon, 28 Nov 2022 23:28:00 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > The fix combines almost the same tests to 1 test to remove code duplication test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount/reset001.java line 45: > 43: * > 44: * @comment Direct access to the metrics. > 45: * @run main/othervm nsk.monitoring.ThreadMXBean.resetPeakThreadCount.reset001 I think it would be better to have several @test id instead of multi @run for such cases. It allows for reporting the individual results for each run and executing them parallelly. ------------- PR: https://git.openjdk.org/jdk/pull/11400 From dholmes at openjdk.org Tue Nov 29 01:35:27 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 01:35:27 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing [v2] In-Reply-To: <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> Message-ID: <qSl1SLXpoVPP1mFgDuykpnlXBrOO87_QZntg-Id6ZVs=.5ed5d054-68c5-46b8-afce-fd8e04392da1@github.com> On Mon, 28 Nov 2022 11:52:22 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. >> Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Thanks for the added comment. I can approve this as-is but I think we have a significant problem here as it remains completely unclear when `KeepStackGCProcessedMark` is needed or how its omission would be detected. This seems extremely fragile. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11238 From fjiang at openjdk.org Tue Nov 29 01:41:48 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Nov 2022 01:41:48 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v2] In-Reply-To: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> Message-ID: <TjIFSBdChwOzZweXDlzPJpm4UBVzbXZacmVa9Z2McNo=.c2766f6a-bb06-4f44-9308-f93d8259dded@github.com> > RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. > As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, > we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. > > When sv57 is enabled, the output of java -version would be: > > > root at qemuriscv64:~# jdk/bin/java -version > Error occurred during initialization of VM > Unsupported satp mode: 10 > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html > > Testing: > > - QEMU-system with sv48/sv57-enabled Linux image `-version` test > - HiFive Unmatched board (sv39) `-version` test Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: print vm mode string instead of vm mode code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11388/files - new: https://git.openjdk.org/jdk/pull/11388/files/c41738c8..abfa5f97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11388&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11388&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11388.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11388/head:pull/11388 PR: https://git.openjdk.org/jdk/pull/11388 From fjiang at openjdk.org Tue Nov 29 01:43:32 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Nov 2022 01:43:32 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v2] In-Reply-To: <n9kbmUxMNp6tTH0U_MT8mPajw7lB6KZRuPscNXMngPo=.8e74b174-a8df-49f4-8893-cc9d5b1502dd@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> <n9kbmUxMNp6tTH0U_MT8mPajw7lB6KZRuPscNXMngPo=.8e74b174-a8df-49f4-8893-cc9d5b1502dd@github.com> Message-ID: <ocMyuXxO2h1ZfJdbw6hZ9ZqEVj8ewDVrr-CSXRVHbBw=.632c6721-6176-4243-9a80-55cf93b43ea4@github.com> On Mon, 28 Nov 2022 22:27:52 GMT, Ludovic Henry <luhenry at openjdk.org> wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> print vm mode string instead of vm mode code > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 43: > >> 41: VM_MODE mode = get_satp_mode(); >> 42: if (mode > RISCV64_ONLY(VM_SV48) RISCV32_ONLY(VM_SV32)) { >> 43: vm_exit_during_initialization(err_msg("Unsupported satp mode: %d", mode)); > > I would map that number to a string. For example 10 to ?sv48? so it is more googlable and searchable. Thanks for the suggestion, actually we can just print `_vm_mode` since it already has mmu string. ------------- PR: https://git.openjdk.org/jdk/pull/11388 From jnimeh at openjdk.org Tue Nov 29 01:54:39 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 29 Nov 2022 01:54:39 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v6] In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <cMgHkwaKCbVJ3j94ZDD18Ibz6Rj66cyO45Inghl4d4Q=.21b78364-60cb-420a-a3a3-2e41b529cf18@github.com> > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: Split comment paragraph up for readability/clarity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7702/files - new: https://git.openjdk.org/jdk/pull/7702/files/bb3f4264..b818411b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7702&range=04-05 Stats: 10 lines in 1 file changed: 4 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/7702.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7702/head:pull/7702 PR: https://git.openjdk.org/jdk/pull/7702 From jnimeh at openjdk.org Tue Nov 29 01:54:43 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 29 Nov 2022 01:54:43 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v5] In-Reply-To: <Z66Np6KrK2x4QYpLcF8ytj1_jRHZgV3P01xxPiGctOY=.7d89ea7f-77ad-4aaf-8ea4-e5bba0bf183b@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <7-omMJqslZjYB-nYcKyasj5bkNSDyCTXyj53yg7hpPI=.d9b796da-cf58-4591-8767-72beb67faf1e@github.com> <Z66Np6KrK2x4QYpLcF8ytj1_jRHZgV3P01xxPiGctOY=.7d89ea7f-77ad-4aaf-8ea4-e5bba0bf183b@github.com> Message-ID: <l96RpLse1fCG0j9YfagS-gNVPNzXL1xQl_fDK8qMaHw=.5e4808e8-7e73-43c5-9e48-dd20e54a518b@github.com> On Mon, 28 Nov 2022 22:58:26 GMT, Anthony Scarpino <ascarpino at openjdk.org> wrote: >> Jamil Nimeh has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - Merge with main >> - Add AVX assertion guard >> - Pull out common macro code into function parameter pack >> - replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations >> - Change intrinsic helper method name conform to convention >> - consolidate chacha macroAssembler routines into chacha stubGenerator file >> - More indentation fixes on aarch64 >> - rename chapoly->chacha for macro file >> - rename chacha macro file to be consistent with x86_64 naming >> - Fix indentation issues >> - ... and 40 more: https://git.openjdk.org/jdk/compare/392ac705...bb3f4264 > > src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line 92: > >> 90: private long counter; >> 91: >> 92: // The 16-int state array and output keystream array: > > I think it would help readability if these comments were separated for each declaration Agreed. I've split those up for each declaration as suggested. ------------- PR: https://git.openjdk.org/jdk/pull/7702 From dholmes at openjdk.org Tue Nov 29 02:03:20 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 02:03:20 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods [v2] In-Reply-To: <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> Message-ID: <u7fF43G5zC2xFaMPxyVK8V4QvT4c56jKjUEjtRnqh-4=.f193dbaa-20f5-4400-ac29-a785473d9150@github.com> On Mon, 28 Nov 2022 12:26:41 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. >> >> We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_64 fastdebug `tier2` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier2` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Revert some additions That seems more recognisably correct. It would be even better if `thread` were named `current` in those cases where it must be, but that is a separate RFE. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11359 From dholmes at openjdk.org Tue Nov 29 02:08:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 02:08:24 GMT Subject: RFR: 8297106: Remove the -Xcheck:jni local reference capacity checking [v3] In-Reply-To: <LflDdTR9zKboGb2anstVwteRLB7LaWoaLpkAKMM1oiU=.534466b9-2269-439c-bd85-77d9e8fbffb9@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> <l3xmqRE7yL-cYPAB3c73hQXU01bqircgZB7vSQGgBpw=.a623c9fa-1b09-4405-9307-26986800db67@github.com> <LflDdTR9zKboGb2anstVwteRLB7LaWoaLpkAKMM1oiU=.534466b9-2269-439c-bd85-77d9e8fbffb9@github.com> Message-ID: <3M6Csgqvoq0g5GRnI2T6-viyaYm_8zM2s-MAjEB3LgQ=.9b925a3b-cbec-4aa6-b3d5-b6252feb8eb7@github.com> On Wed, 23 Nov 2022 09:48:56 GMT, Kevin Walls <kevinw at openjdk.org> wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Manpage update >> - Merge branch 'master' into 8297106-ensure-local-capacity >> - Removed additional test that no longer applies. >> - Forgot to commit deleted test file. >> - Forgot to commit removed test. >> - 8297106: Remove the -Xcheck:jni local reference capacity checking > > Yes good to get rid of these as they have caused confusion and concern over the years, and don't represent a real problem. Thanks @kevinjwalls ! ------------- PR: https://git.openjdk.org/jdk/pull/11259 From dholmes at openjdk.org Tue Nov 29 02:08:25 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 02:08:25 GMT Subject: Integrated: 8297106: Remove the -Xcheck:jni local reference capacity checking In-Reply-To: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> References: <Q3QeqD9mR6wVjo4NLrjGYu-VK7gPDVlJuDoinXuLuxI=.6872d146-e73e-483f-b72a-aa771c27edc5@github.com> Message-ID: <e3J8kaxgQLB1N_mIa0Ql9No61D01DiUaP-iJ05oyHBM=.a518d7f4-0ce0-4bd0-9979-80314d150eb8@github.com> On Mon, 21 Nov 2022 06:53:02 GMT, David Holmes <dholmes at openjdk.org> wrote: > This PR removes the "fake" planned capacity checking mechanism. Please see the JBS issue for the detailed discussion. > > Testing: tiers 1-3 > > Thanks. This pull request has now been integrated. Changeset: 692bedbc Author: David Holmes <dholmes at openjdk.org> URL: https://git.openjdk.org/jdk/commit/692bedbc1df153f362b8e85693f20b089b5594e2 Stats: 289 lines in 8 files changed: 0 ins; 288 del; 1 mod 8297106: Remove the -Xcheck:jni local reference capacity checking Reviewed-by: dcubed, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/11259 From amenkov at openjdk.org Tue Nov 29 02:36:12 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 29 Nov 2022 02:36:12 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v2] In-Reply-To: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> Message-ID: <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> > The fix combines almost the same tests to 1 test to remove code duplication Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Used multiple test tags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11400/files - new: https://git.openjdk.org/jdk/pull/11400/files/eba53c1e..393f1252 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11400&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11400&range=00-01 Stats: 27 lines in 1 file changed: 21 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11400.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11400/head:pull/11400 PR: https://git.openjdk.org/jdk/pull/11400 From amenkov at openjdk.org Tue Nov 29 02:36:13 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 29 Nov 2022 02:36:13 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v2] In-Reply-To: <uEuL8nGO4RPRsfLTVrG5hexvCwZ_teK0pTilA5DwhS4=.a7719c57-643a-4e03-8f25-bdd3c61badc6@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <uEuL8nGO4RPRsfLTVrG5hexvCwZ_teK0pTilA5DwhS4=.a7719c57-643a-4e03-8f25-bdd3c61badc6@github.com> Message-ID: <EAbPzhkQ7vPRozC3a7RxrrQjaaDb_hhnRunAzSaRwEk=.18eb2da7-8be3-4f37-a9ad-b5bdfcb1ba9a@github.com> On Tue, 29 Nov 2022 01:12:52 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Used multiple test tags > > test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount/reset001.java line 45: > >> 43: * >> 44: * @comment Direct access to the metrics. >> 45: * @run main/othervm nsk.monitoring.ThreadMXBean.resetPeakThreadCount.reset001 > > I think it would be better to have several @test id instead of multi @run for such cases. It allows for reporting the individual results for each run and executing them parallelly. Good point. fixed. ------------- PR: https://git.openjdk.org/jdk/pull/11400 From ascarpino at openjdk.org Tue Nov 29 02:52:19 2022 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Tue, 29 Nov 2022 02:52:19 GMT Subject: RFR: 8247645: ChaCha20 intrinsics [v6] In-Reply-To: <cMgHkwaKCbVJ3j94ZDD18Ibz6Rj66cyO45Inghl4d4Q=.21b78364-60cb-420a-a3a3-2e41b529cf18@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> <cMgHkwaKCbVJ3j94ZDD18Ibz6Rj66cyO45Inghl4d4Q=.21b78364-60cb-420a-a3a3-2e41b529cf18@github.com> Message-ID: <BIin-R-wJCrSpH6IVPuCtgV-uYP10WGHYBpO6v6RcKI=.61d99a59-6a84-4fb4-b5f8-4f5be58d8df0@github.com> On Tue, 29 Nov 2022 01:54:39 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: >> >> - x86_64: AVX, AVX2 and AVX512 >> - aarch64: platforms that support the advanced SIMD instructions >> >> Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. >> >> Special thanks to the folks who have made many helpful comments while this PR was in draft form. > > Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: > > Split comment paragraph up for readability/clarity Marked as reviewed by ascarpino (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/7702 From dholmes at openjdk.org Tue Nov 29 04:11:16 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 04:11:16 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v3] In-Reply-To: <uNoa_1yi6xIqGrMPQbNTIJBoU9kDYFueWhx4oyMfXTU=.48904340-c641-40b7-a225-4b186ea1bb32@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <uNoa_1yi6xIqGrMPQbNTIJBoU9kDYFueWhx4oyMfXTU=.48904340-c641-40b7-a225-4b186ea1bb32@github.com> Message-ID: <lqhc0mwY-2Ji0Hw6I7KbqQwrOvaD-x8oiV5Uq9TN27w=.5cfea741-bdb1-432b-9b56-7b984cfe2098@github.com> On Mon, 28 Nov 2022 14:33:39 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: >> `Method::build_profiling_method_data` acquires the `MethodData_lock` when initializing `Method::_method_data` to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and `JvmtiExport::should_post_resource_exhausted()` is set, we assert during the `ThreadToNativeFromVM` transition in JVMTI code. >> >> Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a `cmpxchg`, similar to how method counters are initialized: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 >> >> Since [current code](https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.inline.hpp#L41-L46) in `Method::set_method_data` uses a `Atomic::release_store`, I added a `OrderAccess::release()`. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Completely removed MethodData_lock > I therefore think it's fine to avoid a lock in this case. Also, we already use a lock-free mechanism for `Method::build_method_counters`. `Method::build_method_counters` doesn't appear to be lock-free, in the sense there are no atomic operations; rather concurrency just doesn't seem to be a concern with that code. ?? But as long as the overhead is low and interference unlikely, then we can see how this works out in practice. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11316 From dholmes at openjdk.org Tue Nov 29 04:16:28 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 04:16:28 GMT Subject: RFR: 8297499: Parallel: Missing iteration over klass when marking objArrays/objArrayOops during Full GC In-Reply-To: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> References: <tGrG_WvJCyc6Mq34YcDiqA29rdTUKd40_n6X-kTmHRs=.f5c4a474-5e82-489e-ae55-ed5e9d73e7c1@github.com> Message-ID: <cXvm3Mn07Ie0VMIggnydC8_jWTUfmhEqfmtUMSv6w7o=.749a9012-fc8d-4a9c-9027-1c5b1a569738@github.com> On Wed, 23 Nov 2022 12:55:55 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote: > Extending the current class-unloading test to expose a pre-existing issue in Parallel and the fix. > > Test: the revised test fails for Parallel without the fix The test change is not robust. If the code is JIT'd then all the locals can be nulled at the same time and so break the test's expectations - ref https://bugs.openjdk.org/browse/JDK-8297740 ------------- PR: https://git.openjdk.org/jdk/pull/11321 From dholmes at openjdk.org Tue Nov 29 04:27:59 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 04:27:59 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files In-Reply-To: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Message-ID: <Cr-i-DgNQ3EnSGvByRKuPb4x5jMpfyXCoEp95ITTSwg=.18713fbb-2c6b-4c52-a4c8-6281ca8a69e1@github.com> On Mon, 28 Nov 2022 09:51:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. > > This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. Looks good and trivial. One further issue spotted around one of the fixes but feel free ignore. Thanks for taking care of this @jaikiran ! test/hotspot/jtreg/vmTestbase/nsk/share/locks/DeadlockMaker.java line 31: > 29: /* > 30: * Class used to create deadlocked threads. It is possible to create 2 or more deadlocked thread, also > 31: * it is possible to specify resource of which type should lock each deadlocked thread Even with the corrections this comment still makes little sense, but lets not get bogged down trying to improve these old tests. If you feel like it s/thread,/threads,/ and add a full-stop at the end of the comment. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11386 From dholmes at openjdk.org Tue Nov 29 04:42:18 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Nov 2022 04:42:18 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v2] In-Reply-To: <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> Message-ID: <TIAfQIcbUhuO3BgH7YI4eEb5Yf3PCFmrox8BqzYrVTE=.0f4b589c-a386-4f88-ac0a-6c54314df676@github.com> On Tue, 29 Nov 2022 02:36:12 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> The fix combines almost the same tests to 1 test to remove code duplication > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Used multiple test tags Looks good - nice consolidation. One nit with some pre-existing badly worded text. Thanks. test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount/reset001.java line 37: > 35: * that, resetPeakThreadCount() is invoked to reset the peak. Then > 36: * getPeakThreadCount() and getThreadCount() must return the same values. The > 37: * preposition is that no threads are appered/disappeared between This does not read correctly even ignoring the typo in "appered" - suggestion: > The expectation is that no threads are created, or terminate, between ... And please also fix the same comment at lines 118/119 ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11400 From mernst at openjdk.org Tue Nov 29 05:14:18 2022 From: mernst at openjdk.org (Michael Ernst) Date: Tue, 29 Nov 2022 05:14:18 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files In-Reply-To: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Message-ID: <Jx5dmmu6sI8CYyuw-QMyK4zNM-kuTCIqW8eIiHtbHO4=.a8af3c9b-2b52-4c71-8619-e6009e98eb5d@github.com> On Mon, 28 Nov 2022 09:51:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. > > This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. @jaikiran Thanks so much for your help with this. ------------- PR: https://git.openjdk.org/jdk/pull/11386 From lmesnik at openjdk.org Tue Nov 29 05:58:24 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 29 Nov 2022 05:58:24 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v2] In-Reply-To: <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> Message-ID: <kgvcWjU1oBxR4xHoRm0d5seM2ZsXoiebmneIb_mUs70=.edca6f64-2ca8-4769-896b-1ba14b265ed9@github.com> On Tue, 29 Nov 2022 02:36:12 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> The fix combines almost the same tests to 1 test to remove code duplication > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Used multiple test tags Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11400 From xuelei at openjdk.org Tue Nov 29 06:44:39 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 29 Nov 2022 06:44:39 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v16] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <HiybHNIDQGTOx25SnDscYPQCREM9lzhY9UaNeHaw05w=.b1c8460b-0095-4b40-92d2-62d1fbb1ce9f@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: use checked snprintf for adlc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/4143f51e..d1a48254 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=14-15 Stats: 43 lines in 7 files changed: 12 ins; 1 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From fyang at openjdk.org Tue Nov 29 07:15:30 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Nov 2022 07:15:30 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v2] In-Reply-To: <TjIFSBdChwOzZweXDlzPJpm4UBVzbXZacmVa9Z2McNo=.c2766f6a-bb06-4f44-9308-f93d8259dded@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> <TjIFSBdChwOzZweXDlzPJpm4UBVzbXZacmVa9Z2McNo=.c2766f6a-bb06-4f44-9308-f93d8259dded@github.com> Message-ID: <goUw2sX_XgOAVjg6HF3eBZDqWSUWlGb3vmIJw0fH_Ek=.b08ebb84-354f-419c-8677-02863438cb35@github.com> On Tue, 29 Nov 2022 01:41:48 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. >> As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, >> we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. >> >> When sv57 is enabled, the output of java -version would be: >> >> >> root at qemuriscv64:~# jdk/bin/java -version >> Error occurred during initialization of VM >> Unsupported satp mode: sv57 >> >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html >> >> Testing: >> >> - QEMU-system with sv48/sv57-enabled Linux image `-version` test >> - HiFive Unmatched board (sv39) `-version` test > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > print vm mode string instead of vm mode code src/hotspot/cpu/riscv/vm_version_riscv.cpp line 41: > 39: > 40: // check if satp.mode is supported, currently supports up to SV48(RV64)/SV32(RV32) > 41: if (get_satp_mode() > RISCV64_ONLY(VM_SV48) RISCV32_ONLY(VM_SV32)) { I am not sure whether it makes sense to consider SV32 here. We only support RV32 Zero for now and haven't seen a simillar issue for it yet. ------------- PR: https://git.openjdk.org/jdk/pull/11388 From thartmann at openjdk.org Tue Nov 29 07:36:18 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 29 Nov 2022 07:36:18 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> Message-ID: <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> On Tue, 29 Nov 2022 01:01:02 GMT, Dean Long <dlong at openjdk.org> wrote: > I'm not sure if it's guaranteed that replay runs single-threaded. I haven't looked into the details, but I think ReplayInline can run in any compiler thread after startup, and then of course there is JDK-8254110. But ReplayInline does not load ciMethodData, i.e., does not call `process_ciMethodData`, right? The only way this code can get executed is through `JNI_CreateJavaVM_inner -> ciReplay::replay -> ciReplay::replay_impl -> process-> process_command -> process_ciMethodData` and that only ever happens single-threaded. Of course, if we are ever going to implement [JDK-8254110](https://bugs.openjdk.org/browse/JDK-8254110), we need to revisit that code, but in that case the assert that I added will trigger. What do you think? > Method::build_method_counters doesn't appear to be lock-free, in the sense there are no atomic operations; rather concurrency just doesn't seem to be a concern with that code. ?? It uses the exact same mechanism that I now added to `Method::build_profiling_method_data`, including atomic operations to initialize `_method_counters`: https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L651-L654 And I verified that multiple threads attempt to initialize the counters by adding verification code to: https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 ------------- PR: https://git.openjdk.org/jdk/pull/11316 From xuelei at openjdk.org Tue Nov 29 07:57:36 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 29 Nov 2022 07:57:36 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v17] In-Reply-To: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> Message-ID: <v-LA3R9EgJGM_01c54w4H8ahI7BGPuKGSSpj2Ms7L84=.8dac50eb-2131-4413-a05f-be2e5db9eff0@github.com> > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: comment for snprintf_checked ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11115/files - new: https://git.openjdk.org/jdk/pull/11115/files/d1a48254..c7dd001b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11115&range=15-16 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11115.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11115/head:pull/11115 PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Tue Nov 29 08:01:11 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 29 Nov 2022 08:01:11 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v12] In-Reply-To: <chToHFhLolTySTkjmihCukuyT2OydSgoJVRagHjzpA8=.21e615e3-631f-412c-ac7b-f5ad1b4fec02@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> <NZslB0NAOVhw3IKe-_JWMhfiSJb4p52wHnftlPsz86E=.44faae07-e7c6-4810-aab7-8019c8808c8a@github.com> <chToHFhLolTySTkjmihCukuyT2OydSgoJVRagHjzpA8=.21e615e3-631f-412c-ac7b-f5ad1b4fec02@github.com> Message-ID: <rtLmTI3zPbjoiV8_udUDZUBW6rK-Dck6OJ2A2Cdm4Yk=.93ca57af-b731-4d2f-9d37-c3af05a9e921@github.com> On Sun, 27 Nov 2022 07:57:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > Given all the near-duplicated checking of os::snprintf results, I think there is a place for a helper function to package this up. Thank you for the suggestion. Updated to use snprintf_checked. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Tue Nov 29 08:04:55 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 29 Nov 2022 08:04:55 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v12] In-Reply-To: <chToHFhLolTySTkjmihCukuyT2OydSgoJVRagHjzpA8=.21e615e3-631f-412c-ac7b-f5ad1b4fec02@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <mztLRX-PTuyfSXhNZR9d9z8Ax5pIz5j4UVIrIZVGst4=.ddc4ca8b-253a-4e25-96e5-0233465817da@github.com> <NZslB0NAOVhw3IKe-_JWMhfiSJb4p52wHnftlPsz86E=.44faae07-e7c6-4810-aab7-8019c8808c8a@github.com> <chToHFhLolTySTkjmihCukuyT2OydSgoJVRagHjzpA8=.21e615e3-631f-412c-ac7b-f5ad1b4fec02@github.com> Message-ID: <Jv57G-cUy7gmQMeNiLhQLnW8x8w5ipJA53z6v6ZXjKE=.fa5fd06f-3636-464d-9882-d44a71c1c0a9@github.com> On Sun, 27 Nov 2022 07:57:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote: > How about renaming the existing os::snprintf to something like os::snprintf_unchecked, make os::snprintf the checked version, ... The name `snprintf` may implies the function in C. For that purpose, I may use a name different from`snprintf`, but I have no idea what it could be. ------------- PR: https://git.openjdk.org/jdk/pull/11115 From xuelei at openjdk.org Tue Nov 29 08:09:21 2022 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 29 Nov 2022 08:09:21 GMT Subject: RFR: 8296812: sprintf is deprecated in Xcode 14 [v17] In-Reply-To: <v-LA3R9EgJGM_01c54w4H8ahI7BGPuKGSSpj2Ms7L84=.8dac50eb-2131-4413-a05f-be2e5db9eff0@github.com> References: <fWd_NIunUhDXwy8uZSno90GC-PM8pFCey-pAMxxO0rI=.0fede0e4-1438-4321-bc8d-d11565389cae@github.com> <v-LA3R9EgJGM_01c54w4H8ahI7BGPuKGSSpj2Ms7L84=.8dac50eb-2131-4413-a05f-be2e5db9eff0@github.com> Message-ID: <QipUxIMq3AtebKwI1OcIDG5lEPEAgIror2ToPJ4LfaU=.4abcf6af-f58e-4570-9094-1bf3ac9e29c7@github.com> On Tue, 29 Nov 2022 07:57:36 GMT, Xue-Lei Andrew Fan <xuelei at openjdk.org> wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14 because of security concerns, and the use of it causing building failure. The build could pass if warnings are disabled for codes that use sprintf method. For the long run, the sprintf could be replaced with snprintf. This patch is trying to check if snprintf could be used. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > comment for snprintf_checked Please review the last update, and hopefully we are close to an agreement. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/11115 From amenkov at openjdk.org Tue Nov 29 08:42:29 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 29 Nov 2022 08:42:29 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v3] In-Reply-To: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> Message-ID: <fSDug5Tre90s5oTZ9YHGnYJpDSbqPH5O9ZsElYRO17E=.399cc8de-7fe8-4bdf-ae04-f46648ee15aa@github.com> > The fix combines almost the same tests to 1 test to remove code duplication Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Fixed comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11400/files - new: https://git.openjdk.org/jdk/pull/11400/files/393f1252..f3a2411c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11400&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11400&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11400.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11400/head:pull/11400 PR: https://git.openjdk.org/jdk/pull/11400 From amenkov at openjdk.org Tue Nov 29 08:48:22 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 29 Nov 2022 08:48:22 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v2] In-Reply-To: <TIAfQIcbUhuO3BgH7YI4eEb5Yf3PCFmrox8BqzYrVTE=.0f4b589c-a386-4f88-ac0a-6c54314df676@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <lN-yC7mKMa8G09-YwxZCsi9fQ02j9ZFvUy4JrrRPbSQ=.a254424d-00f3-495b-a85a-f2f9f77ab288@github.com> <TIAfQIcbUhuO3BgH7YI4eEb5Yf3PCFmrox8BqzYrVTE=.0f4b589c-a386-4f88-ac0a-6c54314df676@github.com> Message-ID: <IgKp8N0PsN-VIO397EO2q5_ixr2I7602y8_4O96zpCE=.5d036f2d-9b6e-48b4-ada0-fac9ecdb9c99@github.com> On Tue, 29 Nov 2022 04:34:22 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Used multiple test tags > > test/hotspot/jtreg/vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount/reset001.java line 37: > >> 35: * that, resetPeakThreadCount() is invoked to reset the peak. Then >> 36: * getPeakThreadCount() and getThreadCount() must return the same values. The >> 37: * preposition is that no threads are appered/disappeared between > > This does not read correctly even ignoring the typo in "appered" - suggestion: > >> The expectation is that no threads are created, or terminate, between ... > > And please also fix the same comment at lines 118/119 Done. I'm not sure about commas (for me it's better without them) Fixed here as you suggested and without commas in lines 118-119 :) Please add a comment if you prefer consistency ------------- PR: https://git.openjdk.org/jdk/pull/11400 From kbarrett at openjdk.org Tue Nov 29 09:07:24 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 29 Nov 2022 09:07:24 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> Message-ID: <okiLiw-ASZcv9Cr4VaZX72_QsJz4V3lITjkfkdjetiU=.c293ff3d-1069-423d-b1ee-1c9630b8129a@github.com> On Tue, 29 Nov 2022 01:03:55 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) I think this is okay (not surprisingly). Please add me as a contributor. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11404 From jpai at openjdk.org Tue Nov 29 09:15:20 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 29 Nov 2022 09:15:20 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files [v2] In-Reply-To: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Message-ID: <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> > Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. > > This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: Address David's review suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11386/files - new: https://git.openjdk.org/jdk/pull/11386/files/a7594c33..cc2a47d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11386&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11386&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11386.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11386/head:pull/11386 PR: https://git.openjdk.org/jdk/pull/11386 From jpai at openjdk.org Tue Nov 29 09:15:22 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 29 Nov 2022 09:15:22 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files [v2] In-Reply-To: <Cr-i-DgNQ3EnSGvByRKuPb4x5jMpfyXCoEp95ITTSwg=.18713fbb-2c6b-4c52-a4c8-6281ca8a69e1@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> <Cr-i-DgNQ3EnSGvByRKuPb4x5jMpfyXCoEp95ITTSwg=.18713fbb-2c6b-4c52-a4c8-6281ca8a69e1@github.com> Message-ID: <1L5Nz1XNtHwi_gDM4Du0Cv-ML6hFr5YvIuKKXEs_P9I=.eb426fb5-7be1-4601-bd4c-1ab3b2fd2442@github.com> On Tue, 29 Nov 2022 04:23:56 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: >> >> Address David's review suggestion > > test/hotspot/jtreg/vmTestbase/nsk/share/locks/DeadlockMaker.java line 31: > >> 29: /* >> 30: * Class used to create deadlocked threads. It is possible to create 2 or more deadlocked thread, also >> 31: * it is possible to specify resource of which type should lock each deadlocked thread > > Even with the corrections this comment still makes little sense, but lets not get bogged down trying to improve these old tests. If you feel like it s/thread,/threads,/ and add a full-stop at the end of the comment. Done. I've updated the PR to include this suggested text. ------------- PR: https://git.openjdk.org/jdk/pull/11386 From ngasson at openjdk.org Tue Nov 29 09:45:15 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 29 Nov 2022 09:45:15 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5] In-Reply-To: <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> Message-ID: <CIgyFu3w_lsdObELnvjcwyf2sVNlA53xICQL63ovYZA=.ca2852db-7cf4-44e0-aef3-ff02ff248b7c@github.com> On Thu, 24 Nov 2022 15:56:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Resolve merge conflicts with master > - Merge branch 'master' into JDK-8293488 > - Removed svesha3 feature check for eor3 > - Changed the modifier order preference in JTREG test > - Modified JTREG test to include feature constraints > - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension > > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those > SHA3 instructions - "eor3" performs an exclusive OR of three vectors. > This is helpful in applications that have multiple, consecutive "eor" > operations which can be reduced by clubbing them into fewer operations > using the "eor3" instruction. For example - > eor a, a, b > eor a, a, c > can be optimized to single instruction - eor3 a, b, c > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and > a micro benchmark to assess the performance gains with this patch. > Following are the results of the included micro benchmark on a 128-bit > aarch64 machine that supports Neon, SVE2 and SHA3 features - > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > The numbers shown are performance gains with using Neon eor3 instruction > over the master branch that uses multiple "eor" instructions instead. > Similar gains can be observed with the SVE2 "eor3" version as well since > the "eor3" instruction is unpredicated and the machine under test uses a > maximum vector width of 128 bits which makes the SVE2 code generation very > similar to the one with Neon. test/hotspot/gtest/aarch64/aarch64-asmtest.py line 1043: > 1041: [str(self.reg[i]) for i in range(1, self.numRegs)])) > 1042: def astr(self): > 1043: if self._name == "eor3": Suggestion: firstArg = 0 if self._name == "eor3" else 1 formatStr = "%s%s" + ''.join([", %s" for i in range(firstArg, self.numRegs)]) And similarly below. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From kevinw at openjdk.org Tue Nov 29 09:54:30 2022 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 29 Nov 2022 09:54:30 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v3] In-Reply-To: <fSDug5Tre90s5oTZ9YHGnYJpDSbqPH5O9ZsElYRO17E=.399cc8de-7fe8-4bdf-ae04-f46648ee15aa@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <fSDug5Tre90s5oTZ9YHGnYJpDSbqPH5O9ZsElYRO17E=.399cc8de-7fe8-4bdf-ae04-f46648ee15aa@github.com> Message-ID: <_3IfV8fxD730zo1yNeHylJic81YRJkVbXqor8z_yf5Q=.f7a8d3ea-7669-4726-abb9-acff8dbaca27@github.com> On Tue, 29 Nov 2022 08:42:29 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> The fix combines almost the same tests to 1 test to remove code duplication > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Fixed comments Marked as reviewed by kevinw (Committer). looks good 8-) ------------- PR: https://git.openjdk.org/jdk/pull/11400 From iwalulya at openjdk.org Tue Nov 29 09:56:33 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 29 Nov 2022 09:56:33 GMT Subject: RFR: 8296954: G1: Enable parallel scanning for heap region remset [v2] In-Reply-To: <FKNKreS1s3_gDAnPWr-c2S2p3OkbCiyv7OsjBdY5D3w=.0ce776c5-bc81-4fa8-bdc0-deb9fe4731f1@github.com> References: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> <FKNKreS1s3_gDAnPWr-c2S2p3OkbCiyv7OsjBdY5D3w=.0ce776c5-bc81-4fa8-bdc0-deb9fe4731f1@github.com> Message-ID: <h0CTcCp1AH69Byxg9NdZ4yABo1T38Im4NbLOE7ge7-0=.e4bcbef3-a230-472a-80f2-bfafd3f3a9b8@github.com> On Fri, 18 Nov 2022 13:14:20 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas review > > Lgtm. Thanks @tschatzl and @albertnetymk for the reviews. ------------- PR: https://git.openjdk.org/jdk/pull/11173 From aboldtch at openjdk.org Tue Nov 29 09:58:39 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 29 Nov 2022 09:58:39 GMT Subject: RFR: 8297235: ZGC: assert(regs[i] != regs[j]) failed: Multiple uses of register: rax Message-ID: <jYsSVWwRPyRKvs8HYIwp6Ar5Z0XISOwp1DM4A0AgdOQ=.378c74fb-a4e1-45a5-82b0-838c2fcba43a@github.com> Tests java/util/stream/test/org/openjdk/tests/java/util/* with -XX:+UseZGC -Xcomp -XX:-TieredCompilation crashes with `assert(regs[i] != regs[j]) failed: Multiple uses of register: rax`. More specifically compilation of java.util.concurrent.ForkJoinTask::awaitDone. The reason seems to be that the compare value and the memory input ends up sharing a register. (Uses Unsafe CAS which CAS an object reference into a field of that object, `oldval: rax` and `mem: [rax+offset]`). The Z load barrier stub dispatch implementation require that the reference and reference address occupy distinct registers. In the loadP nodes this is established by marking all but the memory TEMP which results in no sharing. This is not possible for the CompareAndSwapP / CompareAndExchangeP nodes as the compare value is an input node. The solution proposed here is less than ideal as it makes the CAS nodes require one extra TEMP register, which in the common case is unused. This puts unnecessary extra strain on the register allocation. The problem is that there is no way currently (that I can find) to express in .ad that a memory input must not share registers with a specific other input. There is an alternative solution for this specific crash which does not use a second TEMP register (see commit: cfd5ced4e97e986fc10c5a8721b543cd3101c58a). It accomplish this by using the same trick that the aarch64 Z CAS node uses which is to specify the memory as indirect which results in the address being LEA into a register. However from what I can see this does not guarantee that the address and the reference does not share a register (`oldval: rax` and `mem: [rax]`). So it is theoretically broken, (and so is the aarch64 implementation). It is unclear to me if there is ever a way for C2 to generation a CAS which compares the address of the field with its content. I call on anyone with more knowledge about `adlc` and `C2` for feedback. And specifically I want to open up a discussion with these points: * Is there some other way of expressing in the .ad file that a memory input should not share some register? * If not, is this a worthwhile RFE? As it seems to be a patterned used at least in other places in Z. * Will the indirect input ever share a register with oldval and/or are the aarch64/riscv implementations broken because of this? How about ppc? Testing: linux-x64 zgc tagged tests tier 1-7 and some specific crashing tests with `-XX:+UseZGC -Xcomp -XX:-TieredCompilation` (in: java/util/stream/, java/util/concurrent/) ------------- Commit messages: - JDK-8297235: ZGC: assert(regs[i] != regs[j]) failed: Multiple uses of register: rax Changes: https://git.openjdk.org/jdk/pull/11410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297235 Stats: 58 lines in 1 file changed: 26 ins; 22 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/11410.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11410/head:pull/11410 PR: https://git.openjdk.org/jdk/pull/11410 From iwalulya at openjdk.org Tue Nov 29 09:58:51 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 29 Nov 2022 09:58:51 GMT Subject: Integrated: 8296954: G1: Enable parallel scanning for heap region remset In-Reply-To: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> References: <fqG3Jl3VV3LulSnoR6FoVIqQ3ETGtQZf9ZRS7mKVHDM=.2fc28e74-04df-48b5-a3d7-fd8902a86cc2@github.com> Message-ID: <17ib62gydKoBiiDaH_BDatAJi7IiNlUm1lFYXieSv9U=.e0822e3d-d61c-4a63-b066-f19a1123f771@github.com> On Tue, 15 Nov 2022 17:57:24 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote: > Hi all, > > Please review this change that allows parallel scanning of a heap region's remembered set. More balanced work load distribution in cases where are cards are unevenly distributed among remembered sets. > > Testing: Tier 1-3 > > Thanks This pull request has now been integrated. Changeset: 33dfc7d2 Author: Ivan Walulya <iwalulya at openjdk.org> URL: https://git.openjdk.org/jdk/commit/33dfc7d2eface68a6a1edbb507abefa74cc6180f Stats: 33 lines in 8 files changed: 30 ins; 0 del; 3 mod 8296954: G1: Enable parallel scanning for heap region remset Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/11173 From eosterlund at openjdk.org Tue Nov 29 10:05:38 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 29 Nov 2022 10:05:38 GMT Subject: RFR: 8297427: Avoid keeping class loaders alive when executing ClassLoaderStatsVMOperation In-Reply-To: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> References: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> Message-ID: <ZJCYMZzl8bIOmlyTwKTH-uRA0eqwr91RoFu--dbww94=.ac6823a3-11f9-4fb9-b4bb-477d253d3fe7@github.com> On Tue, 22 Nov 2022 20:54:54 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote: > Please review this change to avoid keeping classes alive only due to the `ClassLoaderStatsVMOperation`. > > **Summary** > The `ClassLoaderStatsVMOperation` is gathering statistics about the active class loaders in a safepoint. The way the `ClassLoaderDataGraph` is iterated will keep the class loaders live. This is not really needed since everything is done in a safepoint and nothing needs to be explicitly kept alive. This has not been a problem prior to concurrent class unloading in ZGC. With fully concurrent class unloading a `ClassLoaderStatsVMOperation` can occur during a collection and more classes than needed might be kept alive. This could in turn lead to premature Metaspace OOM. > > The solution is to not keep the class loaders alive due to the iteration in `ClassLoaderStatsVMOperation`. > > **Testing** > * Added a new test that covers the two different ways a class could previously be kept alive by the VM operation. The test passes after the fix but failed before. > * Mach5 tier 1-3 Looks good to me. Nice test! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/11300 From aboldtch at openjdk.org Tue Nov 29 11:35:42 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 29 Nov 2022 11:35:42 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability [v3] In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <2umMbTnSFAMRMSFZRsl5qBaMwCG9FmRPJIv0RdyzZK0=.87d22dcf-d524-4f4d-b28a-647495cb7fee@github.com> > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Add newlines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11018/files - new: https://git.openjdk.org/jdk/pull/11018/files/b483c21c..165e8ae8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11018&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11018&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11018.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11018/head:pull/11018 PR: https://git.openjdk.org/jdk/pull/11018 From aboldtch at openjdk.org Tue Nov 29 11:39:55 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 29 Nov 2022 11:39:55 GMT Subject: RFR: 8296470: Refactor VMError::report STEP macro to improve readability [v3] In-Reply-To: <2umMbTnSFAMRMSFZRsl5qBaMwCG9FmRPJIv0RdyzZK0=.87d22dcf-d524-4f4d-b28a-647495cb7fee@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> <2umMbTnSFAMRMSFZRsl5qBaMwCG9FmRPJIv0RdyzZK0=.87d22dcf-d524-4f4d-b28a-647495cb7fee@github.com> Message-ID: <z-EdNhaG4yPi91AbBSw380kBhhOJjKx_WDX9U-jIYg8=.61c80c4b-d624-4f0e-b220-a288dd235e1a@github.com> On Tue, 29 Nov 2022 11:35:42 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: >> Refactor the STEP macro in VMError::report to improve readability. >> Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. >> >> This enhancement aims to do two things: >> 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. >> 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro >> >> Testing: tier 1 + GHA > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add newlines No worries, thanks for the reviews. ------------- PR: https://git.openjdk.org/jdk/pull/11018 From aboldtch at openjdk.org Tue Nov 29 11:41:46 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 29 Nov 2022 11:41:46 GMT Subject: Integrated: 8296470: Refactor VMError::report STEP macro to improve readability In-Reply-To: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> References: <Lu1xlm8rFylQo0JmhCXoLqG2-5au4BNC4tD8QBRxPV0=.03dce86f-ec56-40e3-8fac-5cc469a4f8fb@github.com> Message-ID: <QtrN2iBZFrHeUi3nsRV_pBYTyad_SFAN5RfGLysQflw=.c048eef8-36e6-4343-8d29-84b274f588d6@github.com> On Mon, 7 Nov 2022 13:25:53 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Refactor the STEP macro in VMError::report to improve readability. > Right now the macro contains multiple statements on one line and the non-conventional control flow is even harder to understand. > > This enhancement aims to do two things: > 1. It splits the macro into multiple lines with indentations which makes the structure of the C++ code generated by the preprocessor clearer. > 2. Separates the internal step logic from the decision logic which decides if a step should be taken with a STEP_IF(step_name_str, condition) macro > > Testing: tier 1 + GHA This pull request has now been integrated. Changeset: 1301fb0b Author: Axel Boldt-Christmas <aboldtch at openjdk.org> URL: https://git.openjdk.org/jdk/commit/1301fb0b5f998c9cf8bcd8a53e6a90d6ab5a7da9 Stats: 733 lines in 1 file changed: 169 ins; 297 del; 267 mod 8296470: Refactor VMError::report STEP macro to improve readability Reviewed-by: stuefe, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/11018 From aph at openjdk.org Tue Nov 29 11:51:39 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 29 Nov 2022 11:51:39 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> Message-ID: <nvUOFle6Lyz7hONaLZhECGu-MKhbk3Zngvy85-tQw28=.15ec4a6d-f5f7-435b-a0d7-667142dd1056@github.com> On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Unused variable src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 385: > 383: try { > 384: JLA.setScopedValueBindings(newSnapshot); > 385: JLA.ensureMaterializedForStackWalk(newSnapshot); Question: is it necessary here to invoke `ensureMaterializedForStackWalk()` It's really only there to prevent the new `Snapshot` from being scalar replaced. But we know that it cannot be scalar replaced, because it really does escape: a pointer to it is stored in the current `Thread`. So should we simply remove the call to `ensureMaterializedForStackWalk()`, on the grounds that it cannot have any effect? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From duke at openjdk.org Tue Nov 29 12:10:11 2022 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 29 Nov 2022 12:10:11 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent [v2] In-Reply-To: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> Message-ID: <6MIyyyl41Wje3Zsvzijy7LQzsZXyvuG5G-vHSttHMho=.1bbd25b3-6ca6-4735-9666-c360329ae2cf@github.com> > The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: - 8287400: Make BitMap range parameter names consistent - Revert "8287400: Make BitMap range parameter names consistent" This reverts commit 170f75aab91b3299c0be0f38c321d3025aeba7e8. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11375/files - new: https://git.openjdk.org/jdk/pull/11375/files/170f75aa..de91b166 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11375&range=00-01 Stats: 128 lines in 2 files changed: 0 ins; 0 del; 128 mod Patch: https://git.openjdk.org/jdk/pull/11375.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11375/head:pull/11375 PR: https://git.openjdk.org/jdk/pull/11375 From duke at openjdk.org Tue Nov 29 12:10:13 2022 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 29 Nov 2022 12:10:13 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent In-Reply-To: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> Message-ID: <YziSpwrhl4SqcsiW8j9wqyM3T0ixG_25sv7BFd-efdg=.7f10c9f5-ed0b-4e2a-8739-f541678cf441@github.com> On Sat, 26 Nov 2022 22:16:34 GMT, Afshin Zafari <duke at openjdk.org> wrote: > The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. All the ranges are determined by 'beg' and 'end'. Test: build local. ------------- PR: https://git.openjdk.org/jdk/pull/11375 From stuefe at openjdk.org Tue Nov 29 12:18:38 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 29 Nov 2022 12:18:38 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v2] In-Reply-To: <DMhQAGoAmX9cVneCXhm_j0XnSHNPAejNn40rVPEB33E=.dd0dda69-62b6-40f5-b153-d113d4fe9b2d@github.com> References: <s2wlyE6OjqTazCsro-keOXqXvYMqwHEp8YdMZhCdQXs=.37245c9a-5891-42b8-b961-55d1a7a30af5@github.com> <g17w1wVjzNMY24qTylC7Ymgfi1MzM-J0cV_HAlBuH2s=.cfe50eb7-f157-4493-97cc-f729f1fb6eda@github.com> <DMhQAGoAmX9cVneCXhm_j0XnSHNPAejNn40rVPEB33E=.dd0dda69-62b6-40f5-b153-d113d4fe9b2d@github.com> Message-ID: <fqKss8Lsz2QmnLCDCr5e6IHKWS4Mn8d3yhWbERuAoCI=.9989067c-dcf7-4ff8-b503-37177ffca80a@github.com> On Mon, 21 Nov 2022 06:01:39 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote: > Added some limitations on reentry of a reentrant step. It will now break the inner loop if: > > * It is the fourth time reentering this step > > * It is the eight time reentering any reentrant step > > * The stack headroom is less than 64K > > * A timeout has been issued > > > The post loop logic of a reentrant step is given another timeout window. Currently all it does is make sure there are line breaks after the step output, but I imagine this can be useful incase some reentrant step logic is used where the loop builds up some data structure and the post logic prints it. > > All of the limit constants are just picked rather ad-hoc. Would be nice to have some extra feedback on this. I think the approach is better, but I'm not a big fan of the broadened os interface and the fact that error-handling-specifics now leak into the os interface. How about letting the print function fill in an opaque continuation object instead? Something like: // Print register information; optionally re-startable. If (*continuation_info) is null, // register printing starts with the first register, otherwise beyond whatever point // it had been interrupted before. void os::print_register_info(const ucontext_t* context, outputStream* st, void** continuation_info); `continuation_info` can have any shape or form the platform-specific implementation wants. It does not have to be visible or known on the outside. It can exist as a struct only in the os_xxx.cpp file. Or, the platform could just hide an integer in the pointer and use it as restart point. The standard `os::print_register_info(const ucontext_t*, outputStream*)` can then be implemented with a dummy continuation info that does nothing: os::print_register_info(const ucontext_t* c, outputStream* st) { void* info; os_print_register_info(c, st, &info); } If that is too C-ish for you, there can be a C++ equivalent via runtime polymorphy. In any case, the charm would be that you can re-start register printing without having to know specific implementation details. You just hand in the continuation object. In a very primitive form, without the need for further STEP macros, that could even look like this: void* print_reg_continuation_info = nullptr; ... STEP(print_register_info_attempt_1) { os::print_register_info(context, st, &print_reg_continuation_info); } STEP(print_register_info_attempt_2) { os::print_register_info(context, st, &print_reg_continuation_info); } STEP(print_register_info_attempt_3) { os::print_register_info(context, st, &print_reg_continuation_info); } (for this simple approach to work, os::print_register_info() would have recognize a completed printing by the content of continuation info). In theory, we could use a similar pattern for call stack printing too. ---- As a side note, I have not been idle since print_register_info() not working bugs me too. I found that many reasons for it not working are caused by only a few bugs. oopDesc::print_on() could do a lot more error checking before printing Klass* for instance. And we also could do more error checking when printing object array elements. I have a patch open, but won't make it before JDK 20 freeze, and probably not before my holidays either. If someone else wants to do it, that would be fine with me too. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/11017 From duke at openjdk.org Tue Nov 29 12:26:28 2022 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 29 Nov 2022 12:26:28 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> Message-ID: <SxTTf0UTexCicPDUny7qiDDizrk4cNvwsJoJORBWcpg=.6fbfa700-2216-4ebe-be6e-4fada9ddb9bb@github.com> On Mon, 28 Nov 2022 22:12:22 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Alternative for struct? > > Not sure ... didn't we start using C++ lambda's for some of these "closure" operations? @coleenp what is the usual pattern we use for this kind of thing? The `unlink` method of `ResourceHashTable` gets an ITER type and calls its `do_entry(Key&,Value&)` method. If we want to use lambdas, `unlink` should call the input directly and not one of its methods. ------------- PR: https://git.openjdk.org/jdk/pull/11288 From jvernee at openjdk.org Tue Nov 29 13:49:40 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 29 Nov 2022 13:49:40 GMT Subject: RFR: 8297729: Replace GrowableArray in ComputeMoveOrder with hash table Message-ID: <Ri3rjM85k80rxq7ZGW8aSHOu-ScyvzVukhVGo649UkM=.784d7aad-248f-4d20-9c5a-a7c765b6b6ab@github.com> Replaces the GrowableArray 'table' in ComputeMoveOrder with a real hash table. Through testing, I found that sometimes this array is blown up to several thousand elements, most of which are `NULL`. Using a hash table prevents this large wastage. I've touched up some of the surrounding code as well. Mostly style changes, but I've also removed the `BasicType` field from the `Move` and `MoveOperation` structs, since it was unused, and added frame data storages to the fast path for stack args as well, since both are allocated on the stack. Testing: `jdk_foreign` test suite. ------------- Depends on: https://git.openjdk.org/jdk/pull/11019 Commit messages: - Drop redundant friend decl - improve hash slightly - Use hashtable in ComputeMoveOrder + general cleanup Changes: https://git.openjdk.org/jdk/pull/11392/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11392&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297729 Stats: 61 lines in 2 files changed: 20 ins; 10 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/11392.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11392/head:pull/11392 PR: https://git.openjdk.org/jdk/pull/11392 From bkilambi at openjdk.org Tue Nov 29 13:55:36 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 29 Nov 2022 13:55:36 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v6] In-Reply-To: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> Message-ID: <w2IYNiW52sxcbY2klozYGqND8GNZyqqxtr2o0vmYs8Q=.2c5df2f0-f4c7-488e-86e8-8e06e52f22de@github.com> > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Improve assembler test generation for eor3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10407/files - new: https://git.openjdk.org/jdk/pull/10407/files/a0aa8cdc..6265863a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=04-05 Stats: 8 lines in 1 file changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10407.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10407/head:pull/10407 PR: https://git.openjdk.org/jdk/pull/10407 From bkilambi at openjdk.org Tue Nov 29 13:57:36 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 29 Nov 2022 13:57:36 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5] In-Reply-To: <CIgyFu3w_lsdObELnvjcwyf2sVNlA53xICQL63ovYZA=.ca2852db-7cf4-44e0-aef3-ff02ff248b7c@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <askgzzGaeFCd8lefaQsn090eUx0HLZ-nn3lmz0ZjSOw=.457403b6-5564-4e30-a55d-3b6dbe59d7ca@github.com> <CIgyFu3w_lsdObELnvjcwyf2sVNlA53xICQL63ovYZA=.ca2852db-7cf4-44e0-aef3-ff02ff248b7c@github.com> Message-ID: <XUvl2ODuX0czzqSGDo23ny4e1vXm4IsR_rn-0h0qEUk=.8afc2ad4-9937-4300-96bf-070f009efa39@github.com> On Tue, 29 Nov 2022 09:41:34 GMT, Nick Gasson <ngasson at openjdk.org> wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Resolve merge conflicts with master >> - Merge branch 'master' into JDK-8293488 >> - Removed svesha3 feature check for eor3 >> - Changed the modifier order preference in JTREG test >> - Modified JTREG test to include feature constraints >> - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension >> >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those >> SHA3 instructions - "eor3" performs an exclusive OR of three vectors. >> This is helpful in applications that have multiple, consecutive "eor" >> operations which can be reduced by clubbing them into fewer operations >> using the "eor3" instruction. For example - >> eor a, a, b >> eor a, a, c >> can be optimized to single instruction - eor3 a, b, c >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and >> a micro benchmark to assess the performance gains with this patch. >> Following are the results of the included micro benchmark on a 128-bit >> aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> The numbers shown are performance gains with using Neon eor3 instruction >> over the master branch that uses multiple "eor" instructions instead. >> Similar gains can be observed with the SVE2 "eor3" version as well since >> the "eor3" instruction is unpredicated and the machine under test uses a >> maximum vector width of 128 bits which makes the SVE2 code generation very >> similar to the one with Neon. > > test/hotspot/gtest/aarch64/aarch64-asmtest.py line 1043: > >> 1041: [str(self.reg[i]) for i in range(1, self.numRegs)])) >> 1042: def astr(self): >> 1043: if self._name == "eor3": > > Suggestion: > > firstArg = 0 if self._name == "eor3" else 1 > formatStr = "%s%s" + ''.join([", %s" for i in range(firstArg, self.numRegs)]) > > > And similarly below. Thank you for the suggestion. I made the suggested changes in the latest patch. Please review. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From xlinzheng at openjdk.org Tue Nov 29 14:02:51 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 29 Nov 2022 14:02:51 GMT Subject: RFR: 8297763: Fix missing stub code expansion before align() in shared trampolines Message-ID: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> This patch fixes missing stub code expansion logic before `align()` for AArch64 and RISC-V. The `align()` at most creates 4-byte padding, so a `NativeInstruction::instruction_size` is enough. I am considering pre-calculating the total trampoline sizes and emitting them in batches, but maybe after this one, for this is a quick fix to unblock https://github.com/openjdk/jdk/pull/11188. Please see that thread. The `assert_alignment(pc());` added in the RISC-V part shows that RVC doesn't change the trampoline stub / static stub logic, so there is no need to adjust the trampoline size for it. [1] Tested AArch64 hotspot tier1~3, and 4 is still running; tested RISC-V hotspot tier1~2, and 3~4 are still running. Thanks, Xiaolin [1] https://github.com/openjdk/jdk/blob/2deb318c9f047ec5a4b160d66a4b52f93688ec42/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3125-L3126 ------------- Commit messages: - Fix simply Changes: https://git.openjdk.org/jdk/pull/11414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297763 Stats: 13 lines in 3 files changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/11414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11414/head:pull/11414 PR: https://git.openjdk.org/jdk/pull/11414 From fjiang at openjdk.org Tue Nov 29 14:27:02 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Nov 2022 14:27:02 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v3] In-Reply-To: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> Message-ID: <lEyL9get5mVAM3E9yw4hf-uSHGD45a6P6CfwM10KIn4=.ef4f6a41-a7da-4e49-8193-11caec0c244b@github.com> > RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. > As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, > we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. > > When sv57 is enabled, the output of java -version would be: > > > root at qemuriscv64:~# jdk/bin/java -version > Error occurred during initialization of VM > Unsupported satp mode: sv57 > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html > > Testing: > > - QEMU-system with sv48/sv57-enabled Linux image `-version` test > - HiFive Unmatched board (sv39) `-version` test Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: remove sv32 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11388/files - new: https://git.openjdk.org/jdk/pull/11388/files/abfa5f97..297ce94b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11388&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11388&range=01-02 Stats: 6 lines in 3 files changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11388.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11388/head:pull/11388 PR: https://git.openjdk.org/jdk/pull/11388 From fyang at openjdk.org Tue Nov 29 14:37:27 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 29 Nov 2022 14:37:27 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v3] In-Reply-To: <lEyL9get5mVAM3E9yw4hf-uSHGD45a6P6CfwM10KIn4=.ef4f6a41-a7da-4e49-8193-11caec0c244b@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> <lEyL9get5mVAM3E9yw4hf-uSHGD45a6P6CfwM10KIn4=.ef4f6a41-a7da-4e49-8193-11caec0c244b@github.com> Message-ID: <-d3C9ujD6NSMiDsWIZn50__zwNJG8KAgxI2GDK8Nxw0=.becd2042-b010-4a36-9aaf-2ae952563cd0@github.com> On Tue, 29 Nov 2022 14:27:02 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. >> As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, >> we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. >> >> When sv57 is enabled, the output of java -version would be: >> >> >> root at qemuriscv64:~# jdk/bin/java -version >> Error occurred during initialization of VM >> Unsupported satp mode: sv57 >> >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html >> >> Testing: >> >> - QEMU-system with sv48/sv57-enabled Linux image `-version` test >> - HiFive Unmatched board (sv39) `-version` test > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > remove sv32 Updated change looks fine. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11388 From qamai at openjdk.org Tue Nov 29 14:38:57 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 29 Nov 2022 14:38:57 GMT Subject: RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v13] In-Reply-To: <IcR6eOr6iM1b9B1I1zUQnI2J9hQAmdSRHYc9CdIGW5E=.3b8dd64d-2eff-4659-b75a-81c4b008e41a@github.com> References: <IcR6eOr6iM1b9B1I1zUQnI2J9hQAmdSRHYc9CdIGW5E=.3b8dd64d-2eff-4659-b75a-81c4b008e41a@github.com> Message-ID: <955ScdreoJQ7PG5cXUmly_giKjOJx8ouU8oy1DX_GEA=.7c59dbbb-4a3b-4f35-a951-4cf0aaa6a047@github.com> > This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask<T>::alltrue` is compiled into machine codes: > > vptest xmm0, xmm1 > jb if_true > if_false: > > instead of: > > vptest xmm0, xmm1 > setb r10 > movzbl r10 > testl r10 > jne if_true > if_false: > > The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements: > > Before After > Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change > ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ? 8316.444 222279.381 ? 2660.983 ops/ms +2.3% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ? 1618.836 116268.691 ? 1291.899 ops/ms +2.1% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ? 72.862 797.806 ? 16.429 ops/ms +13.6% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ? 2401.258 145338.910 ? 687.453 ops/ms -0.5% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ? 1259.397 69041.519 ? 1073.156 ops/ms +13.9% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ? 10.975 408.770 ? 5.281 ops/ms +29.0% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ? 1200.166 188482.433 ? 1872.076 ops/ms -3.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ? 473.013 42293.411 ? 2838.255 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ? 5.410 67.628 ? 3.241 ops/ms -0.8% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ? 1677.607 111060.400 ? 982.230 ops/ms +3.1% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ? 1002.599 21440.506 ? 1618.266 ops/ms +28.4% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ? 0.548 33.202 ? 2.365 ops/ms +0.7% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ? 3154.842 379944.254 ? 5703.134 ops/ms +13.3% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ? 786.312 56721.368 ? 2497.052 ops/ms -3.0% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ? 11.415 139.537 ? 4.667 ops/ms +4.9% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ? 2281.349 112409.365 ? 2110.055 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ? 795.619 33756.613 ? 826.533 ops/ms +24.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ? 8.927 66.951 ? 4.381 ops/ms +16.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ? 1042.497 182438.405 ? 2120.832 ops/ms -0.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ? 614.821 35397.398 ? 1609.235 ops/ms -3.5% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ? 2.142 65.427 ? 2.270 ops/ms -1.5% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ? 497.853 115165.845 ? 5381.674 ops/ms +4.3% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ? 661.350 19871.096 ? 201.464 ops/ms +35.0% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ? 0.821 31.933 ? 1.352 ops/ms +3.8% > > I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge branch 'master' into improveVTest - Merge branch 'master' into improveVTest - redundant casts - remove untaken code paths - Merge branch 'master' into improveVTest - Merge branch 'master' into improveVTest - Merge branch 'master' into improveVTest - fix merge problems - Merge branch 'master' into improveVTest - refactor x86 - ... and 20 more: https://git.openjdk.org/jdk/compare/2f83b5c4...1fec3d30 ------------- Changes: https://git.openjdk.org/jdk/pull/9855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=12 Stats: 494 lines in 23 files changed: 215 ins; 170 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/9855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855 PR: https://git.openjdk.org/jdk/pull/9855 From qamai at openjdk.org Tue Nov 29 14:40:45 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 29 Nov 2022 14:40:45 GMT Subject: RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v12] In-Reply-To: <1qzngp8Z8spVxoU3C8PxQgqkCJFw3anZqp8_mn8qI2s=.2db33f71-30cf-4365-9ba6-d05146fc8771@github.com> References: <IcR6eOr6iM1b9B1I1zUQnI2J9hQAmdSRHYc9CdIGW5E=.3b8dd64d-2eff-4659-b75a-81c4b008e41a@github.com> <1qzngp8Z8spVxoU3C8PxQgqkCJFw3anZqp8_mn8qI2s=.2db33f71-30cf-4365-9ba6-d05146fc8771@github.com> Message-ID: <yVo-DYr_O5FOzzG0KZT5a_X6hPV_q969Q-edAFePz5I=.821e4e15-4f0b-4bc1-aeb2-d9aafb316ba9@github.com> On Wed, 12 Oct 2022 11:58:47 GMT, Quan Anh Mai <qamai at openjdk.org> wrote: >> This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask<T>::alltrue` is compiled into machine codes: >> >> vptest xmm0, xmm1 >> jb if_true >> if_false: >> >> instead of: >> >> vptest xmm0, xmm1 >> setb r10 >> movzbl r10 >> testl r10 >> jne if_true >> if_false: >> >> The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements: >> >> Before After >> Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change >> ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ? 8316.444 222279.381 ? 2660.983 ops/ms +2.3% >> ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ? 1618.836 116268.691 ? 1291.899 ops/ms +2.1% >> ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ? 72.862 797.806 ? 16.429 ops/ms +13.6% >> ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ? 2401.258 145338.910 ? 687.453 ops/ms -0.5% >> ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ? 1259.397 69041.519 ? 1073.156 ops/ms +13.9% >> ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ? 10.975 408.770 ? 5.281 ops/ms +29.0% >> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ? 1200.166 188482.433 ? 1872.076 ops/ms -3.7% >> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ? 473.013 42293.411 ? 2838.255 ops/ms -4.7% >> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ? 5.410 67.628 ? 3.241 ops/ms -0.8% >> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ? 1677.607 111060.400 ? 982.230 ops/ms +3.1% >> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ? 1002.599 21440.506 ? 1618.266 ops/ms +28.4% >> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ? 0.548 33.202 ? 2.365 ops/ms +0.7% >> ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ? 3154.842 379944.254 ? 5703.134 ops/ms +13.3% >> ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ? 786.312 56721.368 ? 2497.052 ops/ms -3.0% >> ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ? 11.415 139.537 ? 4.667 ops/ms +4.9% >> ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ? 2281.349 112409.365 ? 2110.055 ops/ms -4.7% >> ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ? 795.619 33756.613 ? 826.533 ops/ms +24.7% >> ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ? 8.927 66.951 ? 4.381 ops/ms +16.3% >> ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ? 1042.497 182438.405 ? 2120.832 ops/ms -0.3% >> ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ? 614.821 35397.398 ? 1609.235 ops/ms -3.5% >> ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ? 2.142 65.427 ? 2.270 ops/ms -1.5% >> ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ? 497.853 115165.845 ? 5381.674 ops/ms +4.3% >> ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ? 661.350 19871.096 ? 201.464 ops/ms +35.0% >> ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ? 0.821 31.933 ? 1.352 ops/ms +3.8% >> >> I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - Merge branch 'master' into improveVTest > - redundant casts > - remove untaken code paths > - Merge branch 'master' into improveVTest > - Merge branch 'master' into improveVTest > - Merge branch 'master' into improveVTest > - fix merge problems > - Merge branch 'master' into improveVTest > - refactor x86 > - revert renaming temp > - ... and 19 more: https://git.openjdk.org/jdk/compare/86ec158d...05c1b9f5 May I have another review for this PR, please? Thank you very much. ------------- PR: https://git.openjdk.org/jdk/pull/9855 From jnimeh at openjdk.org Tue Nov 29 14:45:18 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 29 Nov 2022 14:45:18 GMT Subject: Integrated: 8247645: ChaCha20 intrinsics In-Reply-To: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> References: <oqKKgLvoD1R7Rqt682QnQvSNXYCvZyUwOaEqArSM2vw=.d8fef0af-b463-4ead-97f3-1ef1f456af85@github.com> Message-ID: <7TC_Fp-S_S05XrWMZz1_tfLEwiCkAEw5iFERW__0s9Y=.cd1f838e-c893-4cb4-b54c-1a8667247e36@github.com> On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function that generates key stream from the key, counter and nonce. Intrinsics have been written for the following platforms and instruction sets: > > - x86_64: AVX, AVX2 and AVX512 > - aarch64: platforms that support the advanced SIMD instructions > > Note: Microbenchmark results moved to a comment in the PR so we don't have to see it in every email. > > Special thanks to the folks who have made many helpful comments while this PR was in draft form. This pull request has now been integrated. Changeset: cd6bebbf Author: Jamil Nimeh <jnimeh at openjdk.org> URL: https://git.openjdk.org/jdk/commit/cd6bebbf34215723fad1d6bfe070a409351920c1 Stats: 1596 lines in 28 files changed: 1558 ins; 6 del; 32 mod 8247645: ChaCha20 intrinsics Reviewed-by: sviswanathan, ngasson, vlivanov, ascarpino ------------- PR: https://git.openjdk.org/jdk/pull/7702 From cstein at openjdk.org Tue Nov 29 14:52:57 2022 From: cstein at openjdk.org (Christian Stein) Date: Tue, 29 Nov 2022 14:52:57 GMT Subject: RFR: 8296710: Update to use jtreg 7.1 Message-ID: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> Please review the change to update to using jtreg `7.1`. The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. This pull request was created by copying the following and using `7.1` at appropriate places: - https://github.com/openjdk/jdk/pull/9393 ------------- Commit messages: - 8296710: Update to use jtreg 7.1 Changes: https://git.openjdk.org/jdk/pull/11416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11416&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296710 Stats: 9 lines in 8 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/11416.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11416/head:pull/11416 PR: https://git.openjdk.org/jdk/pull/11416 From ngasson at openjdk.org Tue Nov 29 14:56:30 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 29 Nov 2022 14:56:30 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v6] In-Reply-To: <w2IYNiW52sxcbY2klozYGqND8GNZyqqxtr2o0vmYs8Q=.2c5df2f0-f4c7-488e-86e8-8e06e52f22de@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <w2IYNiW52sxcbY2klozYGqND8GNZyqqxtr2o0vmYs8Q=.2c5df2f0-f4c7-488e-86e8-8e06e52f22de@github.com> Message-ID: <GbO9xMnZjj8GxSp_lcuA603LAnDreMStC0Ir6sQeTm4=.3e308fe8-5e4f-41fd-ba61-4af3085b9e13@github.com> On Tue, 29 Nov 2022 13:55:36 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Improve assembler test generation for eor3 Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10407 From alanb at openjdk.org Tue Nov 29 16:09:24 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 29 Nov 2022 16:09:24 GMT Subject: RFR: 8296710: Update to use jtreg 7.1 In-Reply-To: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> References: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> Message-ID: <H9LOzHSfN7PLpiF6oYdnlHFQLasj6FeRgZKUKm3XyXs=.591ef716-45eb-4bf5-a6d5-d9ebe887d82a@github.com> On Tue, 29 Nov 2022 14:44:12 GMT, Christian Stein <cstein at openjdk.org> wrote: > Please review the change to update to using jtreg `7.1`. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. > > This pull request was created by copying the following and using `7.1` at appropriate places: > - https://github.com/openjdk/jdk/pull/9393 Can you confirm that you've run all the tests with the change? Sometimes these updates need changes to a small number of tests. ------------- PR: https://git.openjdk.org/jdk/pull/11416 From jwaters at openjdk.org Tue Nov 29 17:09:28 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Nov 2022 17:09:28 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <3mBOtLQz_ulylm0XJkhUBwzPEV7AoPmMo20facw9Xn4=.c431ac9c-4f1f-4db6-bb46-745987f06777@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> <klNxsMmpOjXNtFhaX2Kqt-YqYSBCyCAoJkqnmFr_IeY=.b55c0f1a-4331-4d65-af05-a20b59f73f57@github.com> <3mBOtLQz_ulylm0XJkhUBwzPEV7AoPmMo20facw9Xn4=.c431ac9c-4f1f-4db6-bb46-745987f06777@github.com> Message-ID: <WgvP0-stpzWV4Z7nlCb2_bBFnP4W-siUmkatZ4I-2l0=.43cbd4c2-3475-4034-93ee-26a370be0016@github.com> On Wed, 23 Nov 2022 21:36:24 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> I think the problem here is the friend declaration, which doesn't look like it's needed and could be deleted. > > Digging into this some more, the friend declaration exists to provide access to the private `os::win32::enum Ept`. > > One obvious and cheap solution to that would be to make that enum public. I think that would be an improvement vs the current friend declaration. But there are some other things one could complain about there, such as the type of the function requiring a complicated function pointer cast where it's used. Here's a patch that I think cleans this up. > > > diff --git a/src/hotspot/os/windows/os_windows.cpp b/src/hotspot/os/windows/os_windows.cpp > index 0651f0868f3..bf9e759b1d6 100644 > --- a/src/hotspot/os/windows/os_windows.cpp > +++ b/src/hotspot/os/windows/os_windows.cpp > @@ -511,7 +511,9 @@ JNIEXPORT > LONG WINAPI topLevelExceptionFilter(struct _EXCEPTION_POINTERS* exceptionInfo); > > // Thread start routine for all newly created threads > -static unsigned __stdcall thread_native_entry(Thread* thread) { > +// Called with the associated Thread* as the argument. > +unsigned __stdcall os::win32::thread_native_entry(void* t) { > + Thread* thread = static_cast<Thread*>(t); > > thread->record_stack_base_and_size(); > thread->initialize_thread_current(); > @@ -744,7 +746,7 @@ bool os::create_thread(Thread* thread, ThreadType thr_type, > thread_handle = > (HANDLE)_beginthreadex(NULL, > (unsigned)stack_size, > - (unsigned (__stdcall *)(void*)) thread_native_entry, > + &os::win32::thread_native_entry, > thread, > initflag, > &thread_id); > diff --git a/src/hotspot/os/windows/os_windows.hpp b/src/hotspot/os/windows/os_windows.hpp > index 94d7c3c5e2d..197797078d7 100644 > --- a/src/hotspot/os/windows/os_windows.hpp > +++ b/src/hotspot/os/windows/os_windows.hpp > @@ -36,7 +36,6 @@ typedef void (*signal_handler_t)(int); > > class os::win32 { > friend class os; > - friend unsigned __stdcall thread_native_entry(Thread*); > > protected: > static int _processor_type; > @@ -70,6 +69,10 @@ class os::win32 { > static HINSTANCE load_Windows_dll(const char* name, char *ebuf, int ebuflen); > > private: > + // The handler passed to _beginthreadex(). > + // Called with the associated Thread* as the argument. > + static unsigned __stdcall thread_native_entry(void*); > + > enum Ept { EPT_THREAD, EPT_PROCESS, EPT_PROCESS_DIE }; > // Wrapper around _endthreadex(), exit() and _exit() > static int exit_process_or_thread(Ept what, int exit_code); The issue with that would be that thread_native_entry is declared as static to the compilation unit on other other Operating Systems as well, and having it as a static member on the win32 class instead would end up breaking this convention, for which I'm not sure if there's a reason why all of them are declared like this ------------- PR: https://git.openjdk.org/jdk/pull/11081 From bkilambi at openjdk.org Tue Nov 29 17:15:28 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 29 Nov 2022 17:15:28 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v6] In-Reply-To: <w2IYNiW52sxcbY2klozYGqND8GNZyqqxtr2o0vmYs8Q=.2c5df2f0-f4c7-488e-86e8-8e06e52f22de@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> <w2IYNiW52sxcbY2klozYGqND8GNZyqqxtr2o0vmYs8Q=.2c5df2f0-f4c7-488e-86e8-8e06e52f22de@github.com> Message-ID: <xsSmLgaO6D7NO1o6I-JVew-fUf0fVQOQS_gVMbPVz3Q=.a33597d0-36f6-4e4e-abae-bbb23de46824@github.com> On Tue, 29 Nov 2022 13:55:36 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Improve assembler test generation for eor3 Thank you for all the reviews. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From bkilambi at openjdk.org Tue Nov 29 17:20:48 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 29 Nov 2022 17:20:48 GMT Subject: Integrated: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension In-Reply-To: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> References: <Av7Yr_MaH9-lozULxqDQyy4pdP0SXy2MWQYkQhWTp0Y=.95cdc9c8-0ea7-4337-ac72-11f58a17ca73@github.com> Message-ID: <lMRSoQvwDvtSaakE6nK6JcBRRa9Gb7d5M-vS37VfYDM=.42a00d36-a2db-4a8c-9da8-e72eb711c218@github.com> On Fri, 23 Sep 2022 11:13:40 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote: > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. This pull request has now been integrated. Changeset: 54e6d6aa Author: Bhavana Kilambi <bkilambi at openjdk.org> Committer: Nick Gasson <ngasson at openjdk.org> URL: https://git.openjdk.org/jdk/commit/54e6d6aaeb5dec2dc1b9fb3ac9b34c8621df506d Stats: 325 lines in 7 files changed: 290 ins; 0 del; 35 mod 8293488: Add EOR3 backend rule for aarch64 SHA3 extension Reviewed-by: haosun, njian, eliu, aturbanov, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/10407 From kvn at openjdk.org Tue Nov 29 17:28:38 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Nov 2022 17:28:38 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v3] In-Reply-To: <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> Message-ID: <Dv9reaiRNHZe2ZTmzcBKdpIgKz0vj9nukGRPa5NpbtY=.edb7e135-5e31-4632-adaa-6cdc4e0260f3@github.com> On Wed, 23 Nov 2022 10:54:27 GMT, Andrew Haley <aph at openjdk.org> wrote: >>> > Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. >>> >>> It might be a bit more than that: it's a test-and-branch on every memory access. Maybe enable it only on a non-optimized build? >> >> I am fine with enabling it for debug VM. But can you give at least some numbers? > >> > > Changes are good. Can you tell more about `-fsanitize=null` effect on libjvm size and performance of fastdebug build we use in testing? If it is only few percents I am for enabling it in debug build. >> > >> > >> > It might be a bit more than that: it's a test-and-branch on every memory access. Maybe enable it only on a non-optimized build? >> >> I am fine with enabling it for debug VM. But can you give at least some numbers? > > Sorry I'm being slow on this. I'm trying to get scoped values done before the fork. @theRealAph you may defer it to next release (`tbd`) since you don't have time to work on this. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From erikj at openjdk.org Tue Nov 29 17:57:22 2022 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 29 Nov 2022 17:57:22 GMT Subject: RFR: 8296710: Update to use jtreg 7.1 In-Reply-To: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> References: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> Message-ID: <3MbFV3Yz1mkexhdEMXTtpXXGdr-yJvvM7Z9FyyVu9Do=.2c003086-a5b8-45f7-86d0-4574425677c7@github.com> On Tue, 29 Nov 2022 14:44:12 GMT, Christian Stein <cstein at openjdk.org> wrote: > Please review the change to update to using jtreg `7.1`. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. > > This pull request was created by copying the following and using `7.1` at appropriate places: > - https://github.com/openjdk/jdk/pull/9393 Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11416 From coleenp at openjdk.org Tue Nov 29 22:54:10 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 29 Nov 2022 22:54:10 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> Message-ID: <RDEJMCCwdwBcdfZEVQPFtrG-MRRwqXRxoZuR8NhnvLs=.c405c798-47fd-48bb-b951-a98fc5f89f4d@github.com> On Mon, 28 Nov 2022 22:16:34 GMT, David Holmes <dholmes at openjdk.org> wrote: >> Coleen's suggestion for efficiency reasons. > > If the comment is meant to suggest some possible future optimisation it should say that. Oh you should uncomment that code. The hashtable that this replaces has the same early cut-out for efficiency. You can also have an early cut out if the hashtable is empty, like if (is_empty()) { return 0 } ------------- PR: https://git.openjdk.org/jdk/pull/11288 From coleenp at openjdk.org Tue Nov 29 22:54:13 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 29 Nov 2022 22:54:13 GMT Subject: RFR: 8292741: Convert JvmtiTagMapTable to ResourceHashtable In-Reply-To: <SxTTf0UTexCicPDUny7qiDDizrk4cNvwsJoJORBWcpg=.6fbfa700-2216-4ebe-be6e-4fada9ddb9bb@github.com> References: <Dj9YfY10SLn8dR1Ez3kEQe4tXm1q5W1TS8_4r3gBUsY=.6c38825a-86e5-4575-a509-683786d242a6@github.com> <m0UnY-x47SAtCGdPlMcUKfxd0lWRqURPb3u0wKNCT3w=.97f998ef-48f0-4b4f-a144-822ebcf7dc98@github.com> <WC0j16HlKWRzgKhC9xjs_ZQr-42Bk7_cMOOCK3rY-yo=.c5f6e693-7d51-4277-86ab-ad250967041c@github.com> <c4ophxMSfP2waXPjUz3x4N57crZdQQb-d_xsOIfGmCw=.0de5f1fa-d697-4ca2-aa7a-3c5a4e030236@github.com> <SxTTf0UTexCicPDUny7qiDDizrk4cNvwsJoJORBWcpg=.6fbfa700-2216-4ebe-be6e-4fada9ddb9bb@github.com> Message-ID: <jnGI0gp4Ab1_Jj__Mkduwu6IcBV9zx4JqUtstwNrR8A=.feee8f26-3e76-4832-a683-382c2538e508@github.com> On Tue, 29 Nov 2022 12:22:38 GMT, Afshin Zafari <duke at openjdk.org> wrote: >> Not sure ... didn't we start using C++ lambda's for some of these "closure" operations? @coleenp what is the usual pattern we use for this kind of thing? > > The `unlink` method of `ResourceHashTable` gets an ITER type and calls its `do_entry(Key&,Value&)` method. > If we want to use lambdas, `unlink` should call the input directly and not one of its methods. We don't have a lambda version of unlink (yet). The local struct is fine IMO. ------------- PR: https://git.openjdk.org/jdk/pull/11288 From coleenp at openjdk.org Wed Nov 30 00:23:17 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 30 Nov 2022 00:23:17 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods [v2] In-Reply-To: <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> Message-ID: <2GEzNJsx32XuhuFq7rkgF3C6bEUxZ_MzOh7pptfUuHY=.e3791947-02e0-408e-97a6-d2c71744b2b4@github.com> On Mon, 28 Nov 2022 12:26:41 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. >> >> We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_64 fastdebug `tier2` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier2` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Revert some additions I thought this was going to be in the JRT_LEAF macro in interfaceSupport.inline.hpp but this seems fine. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/11359 From fjiang at openjdk.org Wed Nov 30 00:51:33 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 30 Nov 2022 00:51:33 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v2] In-Reply-To: <goUw2sX_XgOAVjg6HF3eBZDqWSUWlGb3vmIJw0fH_Ek=.b08ebb84-354f-419c-8677-02863438cb35@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> <TjIFSBdChwOzZweXDlzPJpm4UBVzbXZacmVa9Z2McNo=.c2766f6a-bb06-4f44-9308-f93d8259dded@github.com> <goUw2sX_XgOAVjg6HF3eBZDqWSUWlGb3vmIJw0fH_Ek=.b08ebb84-354f-419c-8677-02863438cb35@github.com> Message-ID: <Jyef071VbxhgYb3N629Cr1RhXe9KqtaHKuHkYDPnoKU=.3a201e71-6308-4792-adf9-1423b57d6c85@github.com> On Tue, 29 Nov 2022 07:09:50 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> print vm mode string instead of vm mode code > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 41: > >> 39: >> 40: // check if satp.mode is supported, currently supports up to SV48(RV64)/SV32(RV32) >> 41: if (get_satp_mode() > RISCV64_ONLY(VM_SV48) RISCV32_ONLY(VM_SV32)) { > > I am not sure whether it makes sense to consider SV32 here. We only support RV32 Zero for now and haven't seen a simillar issue for it yet. As RISC-V32 only has Zero support, it's safe to remove SV32. We can add it back when RV32 backend is ready. ------------- PR: https://git.openjdk.org/jdk/pull/11388 From darcy at openjdk.org Wed Nov 30 04:58:24 2022 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 30 Nov 2022 04:58:24 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v32] In-Reply-To: <cJ48CQwDtj894fCv2OVZb0czHdZeP0onHPA8KDIEyjg=.1c2bf974-6039-4cf4-894c-1329f173efbc@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <cJ48CQwDtj894fCv2OVZb0czHdZeP0onHPA8KDIEyjg=.1c2bf974-6039-4cf4-894c-1329f173efbc@github.com> Message-ID: <cLXbImXuUZk90CMUs_9DM_ZAEWTsIxqU21HkQuvq32Q=.0c61d46d-2ea3-4b6d-8c01-c9e4b7a8fc9d@github.com> On Mon, 28 Nov 2022 19:29:08 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments src/java.base/share/classes/java/lang/foreign/Linker.java line 288: > 286: > 287: /** > 288: * {@return A linker option used to denote the index of the first variadic argument layout in a Typo: "A linker" vs "a linker" ------------- PR: https://git.openjdk.org/jdk/pull/10872 From dholmes at openjdk.org Wed Nov 30 05:18:27 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Nov 2022 05:18:27 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent [v2] In-Reply-To: <6MIyyyl41Wje3Zsvzijy7LQzsZXyvuG5G-vHSttHMho=.1bbd25b3-6ca6-4735-9666-c360329ae2cf@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> <6MIyyyl41Wje3Zsvzijy7LQzsZXyvuG5G-vHSttHMho=.1bbd25b3-6ca6-4735-9666-c360329ae2cf@github.com> Message-ID: <kMVNb79ag0JF6LzhMNxXKn8IQEi5P26abHSh6mPvOIY=.66a9e23f-d910-4174-9d44-12a5d1086eb3@github.com> On Tue, 29 Nov 2022 12:10:11 GMT, Afshin Zafari <duke at openjdk.org> wrote: >> The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. > > Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: > > - 8287400: Make BitMap range parameter names consistent > - Revert "8287400: Make BitMap range parameter names consistent" > > This reverts commit 170f75aab91b3299c0be0f38c321d3025aeba7e8. This seems fine now and much simpler. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11375 From dholmes at openjdk.org Wed Nov 30 06:07:30 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Nov 2022 06:07:30 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <WgvP0-stpzWV4Z7nlCb2_bBFnP4W-siUmkatZ4I-2l0=.43cbd4c2-3475-4034-93ee-26a370be0016@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <d4LIWjQh3RKW81WqqVCiXlQLRJDENyfAJYkQCwWwBZU=.b019f494-0d0c-4da2-8f07-09b6c589984e@github.com> <0fVP40VVRuOoZCEJ1M3BLubshBHbD4m_lj-j1qaGTTk=.391ade50-b3fe-4d4e-ae71-ba8a975a31cd@github.com> <mowWW-lO9a5Zo4iT-sbh1YTZFXO4UjKuJK7OpfrhFFo=.c7e5430a-de42-4465-ba85-10dfb8c71184@github.com> <UwMOA0K5cYSIeTkRgIl6QlPe2iTTA5z0vCH3jzUmx4E=.2b35fc6c-e96a-4807-863f-583631128a4e@github.com> <klNxsMmpOjXNtFhaX2Kqt-YqYSBCyCAoJkqnmFr_IeY=.b55c0f1a-4331-4d65-af05-a20b59f73f57@github.com> <3mBOtLQz_ulylm0XJkhUBwzPEV7AoPmMo20facw9Xn4=.c431ac9c-4f1f-4db6-bb46-745987f06777@github.com> <WgvP0-stpzWV4Z7nlCb2_bBFnP4W-siUmkatZ4I-2l0=.43cbd4c2-3475-4034-93ee-26a370be0016@github.com> Message-ID: <5dO9bO57dyw99J9wxZceMOy1FujzO-JFXWxODl3LlCc=.e37a178a-21c9-4aec-97f0-ec01eb3903b0@github.com> On Tue, 29 Nov 2022 17:07:02 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Digging into this some more, the friend declaration exists to provide access to the private `os::win32::enum Ept`. >> >> One obvious and cheap solution to that would be to make that enum public. I think that would be an improvement vs the current friend declaration. But there are some other things one could complain about there, such as the type of the function requiring a complicated function pointer cast where it's used. Here's a patch that I think cleans this up. >> >> >> diff --git a/src/hotspot/os/windows/os_windows.cpp b/src/hotspot/os/windows/os_windows.cpp >> index 0651f0868f3..bf9e759b1d6 100644 >> --- a/src/hotspot/os/windows/os_windows.cpp >> +++ b/src/hotspot/os/windows/os_windows.cpp >> @@ -511,7 +511,9 @@ JNIEXPORT >> LONG WINAPI topLevelExceptionFilter(struct _EXCEPTION_POINTERS* exceptionInfo); >> >> // Thread start routine for all newly created threads >> -static unsigned __stdcall thread_native_entry(Thread* thread) { >> +// Called with the associated Thread* as the argument. >> +unsigned __stdcall os::win32::thread_native_entry(void* t) { >> + Thread* thread = static_cast<Thread*>(t); >> >> thread->record_stack_base_and_size(); >> thread->initialize_thread_current(); >> @@ -744,7 +746,7 @@ bool os::create_thread(Thread* thread, ThreadType thr_type, >> thread_handle = >> (HANDLE)_beginthreadex(NULL, >> (unsigned)stack_size, >> - (unsigned (__stdcall *)(void*)) thread_native_entry, >> + &os::win32::thread_native_entry, >> thread, >> initflag, >> &thread_id); >> diff --git a/src/hotspot/os/windows/os_windows.hpp b/src/hotspot/os/windows/os_windows.hpp >> index 94d7c3c5e2d..197797078d7 100644 >> --- a/src/hotspot/os/windows/os_windows.hpp >> +++ b/src/hotspot/os/windows/os_windows.hpp >> @@ -36,7 +36,6 @@ typedef void (*signal_handler_t)(int); >> >> class os::win32 { >> friend class os; >> - friend unsigned __stdcall thread_native_entry(Thread*); >> >> protected: >> static int _processor_type; >> @@ -70,6 +69,10 @@ class os::win32 { >> static HINSTANCE load_Windows_dll(const char* name, char *ebuf, int ebuflen); >> >> private: >> + // The handler passed to _beginthreadex(). >> + // Called with the associated Thread* as the argument. >> + static unsigned __stdcall thread_native_entry(void*); >> + >> enum Ept { EPT_THREAD, EPT_PROCESS, EPT_PROCESS_DIE }; >> // Wrapper around _endthreadex(), exit() and _exit() >> static int exit_process_or_thread(Ept what, int exit_code); > > The issue with that would be that thread_native_entry is declared as static to the compilation unit on other other Operating Systems as well, and having it as a static member on the win32 class instead would end up breaking this convention, for which I'm not sure if there's a reason why all of them are declared like this The thread entry functions are expected to be plain C functions as we use C library calls to create threads (`_beginthreadex`, `pthread_create`) not C++. They don't need to be visible outside the compilation unit hence static. ------------- PR: https://git.openjdk.org/jdk/pull/11081 From dholmes at openjdk.org Wed Nov 30 06:30:18 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Nov 2022 06:30:18 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> Message-ID: <lIGuVSmjDi39UeApfZ2ZR2DyeFRABEJYJ4WAFLmCdew=.1f446fb3-8daa-429b-9838-5822287758b3@github.com> On Tue, 29 Nov 2022 07:34:08 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > It uses the exact same mechanism that I now added to Method::build_profiling_method_data, including atomic operations to initialize _method_counters Sorry I somehow missed the `init_method_counters` logic. :( ------------- PR: https://git.openjdk.org/jdk/pull/11316 From dholmes at openjdk.org Wed Nov 30 06:34:18 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Nov 2022 06:34:18 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files [v2] In-Reply-To: <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> Message-ID: <NrLLbf-4wO8jC7rCxyMCKGRpLR2Kjc2WCe29GRRcomY=.695fbedc-be34-4a8f-95a4-59168d67dc7e@github.com> On Tue, 29 Nov 2022 09:15:20 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: >> Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. >> >> This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. > > Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: > > Address David's review suggestion Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11386 From jwaters at openjdk.org Wed Nov 30 07:26:46 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 07:26:46 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> Message-ID: <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> > Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into alignas - HotSpot Style Guide changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11404/files - new: https://git.openjdk.org/jdk/pull/11404/files/cec044b4..43683f1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11404&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11404&range=00-01 Stats: 5941 lines in 250 files changed: 3584 ins; 1175 del; 1182 mod Patch: https://git.openjdk.org/jdk/pull/11404.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11404/head:pull/11404 PR: https://git.openjdk.org/jdk/pull/11404 From pli at openjdk.org Wed Nov 30 07:29:22 2022 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Nov 2022 07:29:22 GMT Subject: RFR: 8297689: Fix incorrect result of Short.reverseBytes() call in loops Message-ID: <yv1aUlZMKpLVbXZmkSqZtWLlDTa-5PeX6pzzBu1Rrb8=.6ef681fe-6dbd-4dab-9ee0-61f59fbeb024@github.com> Recently, we find calling `Short.reverseBytes()` in loops may generate incorrect result if the code is compiled by C2. Below is a simple case to reproduce. class Foo { static final int SIZE = 50; static int a[] = new int[SIZE]; static void test() { for (int i = 0; i < SIZE; i++) { a[i] = Short.reverseBytes((short) a[i]); } } public static void main(String[] args) throws Exception { Class.forName("java.lang.Short"); a[25] = 16; test(); System.out.println(a[25]); } } // $ java -Xint Foo // 4096 // $ java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Foo.test Foo // 268435456 In this case, the `reverseBytes()` call is intrinsified and transformed into a `ReverseBytesS` node. But then C2 compiler incorrectly vectorizes it into `ReverseBytesV` with int type. C2 `Op_ReverseBytes*` has short, char, int and long versions. Their behaviors are different for different data sizes. In superword, subword operation itself doesn't have precise data size info. Instead, the data size info comes from memory operations in its use-def chain. Hence, vectorization of `reverseBytes()` is valid only if the data size is consistent with the type size of the caller's class. But current C2 compiler code lacks fine-grained type checks for `ReverseBytes*` in vector transformation. It results in `reverseBytes()` call from Short or Character class with int load/store gets vectorized incorrectly in above case. To fix the issue, this patch adds more checks in `VectorNode::opcode()`. T_BYTE is a special case for `Op_ReverseBytes*`. As the Java Byte class doesn't have `reverseBytes()` method so there's no `Op_ReverseBytesB`. But T_BYTE may still appear in VectorAPI calls. In this patch we still use `Op_ReverseBytesI` for T_BYTE to ensure vector intrinsification succeeds. Tested with hotspot::hotspot_all_no_apps, jdk tier1~3 and langtools tier1 on x86 and AArch64, no issue is found. ------------- Commit messages: - 8297689: Fix incorrect result of Short.reverseBytes() call in loops Changes: https://git.openjdk.org/jdk/pull/11427/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11427&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297689 Stats: 166 lines in 6 files changed: 160 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/11427.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11427/head:pull/11427 PR: https://git.openjdk.org/jdk/pull/11427 From jwaters at openjdk.org Wed Nov 30 07:40:22 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 07:40:22 GMT Subject: Integrated: 8252584: HotSpot Style Guide should permit alignas In-Reply-To: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> Message-ID: <Qd2UP38WqQbr2IsFhKGzd9d-CCG3Doth3GeRqrVdgtA=.8d17dfae-52c3-468e-b688-e6e91fdc0138@github.com> On Tue, 29 Nov 2022 01:03:55 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) This pull request has now been integrated. Changeset: 22f5d014 Author: Julian Waters <jwaters at openjdk.org> URL: https://git.openjdk.org/jdk/commit/22f5d014287a5cae2c0503ab3f9730f64725605a Stats: 94 lines in 2 files changed: 92 ins; 0 del; 2 mod 8252584: HotSpot Style Guide should permit alignas Co-authored-by: Kim Barrett <kbarrett at openjdk.org> Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/11404 From jwaters at openjdk.org Wed Nov 30 07:55:24 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 07:55:24 GMT Subject: RFR: 8295146: Clean up native code with newer C/C++ language features [v3] In-Reply-To: <4NLoUD7YZRPKuVc08q_QnsbuPpb0wZIdmAuMN2tqM_c=.a34c13d0-0a6b-495e-b48e-e88debf6b4da@github.com> References: <h7OfjsjMR4UCdsjoU4LJiMhJdBOUCORnEtMY2vBSiII=.faa9c80c-9dc2-47b2-ab1f-e964d04be41b@github.com> <gay-N6xDnfKHcngB9ddJIZD6Jfg2m_ZCzZn1gWPFN-o=.785036e8-1d1d-41d3-bac3-211b9d03cd71@github.com> <XXVqN4ByCrB34JRZSgiNYWsdrwEOTKjo5u81sTFG5bE=.7748c17a-4a13-42aa-b10d-219fe6775da2@github.com> <nVCoCE9doWoQ54qhPSbn1xM79YLzhggfELps4CpI53o=.3af1e9bf-5389-439e-bd78-1b4fce336c2f@github.com> <4NLoUD7YZRPKuVc08q_QnsbuPpb0wZIdmAuMN2tqM_c=.a34c13d0-0a6b-495e-b48e-e88debf6b4da@github.com> Message-ID: <X08uwA9tf91BTq70u3o3koABY-5ibEiJjUGuXECqoJ8=.496742e5-9430-4ff1-9944-9922e4070035@github.com> On Wed, 23 Nov 2022 04:58:38 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: >> Out of curiosity, is there a way to get the discussion on approving the use of alignas back up? I've read through 8250269 briefly and unlike the issues that come with C++ attributes, alignas looks relatively straightforward to switch to, without much effect on existing code. Seems like a bit of a waste to leave the JBS entry sitting on the shelf to me >> >>> The various MSVC-conditional direct uses of __declspec(align(N)) should probably currently be using ATTRIBUTE_ALIGNED. >> >> The instances of `__declspec(align())` changed here are in the native libraries written in C, not within HotSpot itself. From what I can see at least HotSpot never uses compiler alignment attributes directly and always strictly sticks to `ATTRIBUTE_ALIGNED` (which is probably a good thing) > >> Out of curiosity, is there a way to get the discussion on approving the use of alignas back up? [...] > > A PR to address JDK-8252584 would be welcomed by me. Just do the process for > Style Guide changes (see the Style Guide or previous PRs for such). I don't > expect it would be very controversial. I think the only reason it hasn't > already happened is because nobody has gotten around to it, or felt the need > for it. > > JDK-8250269 touches a bit more code (mostly in stubGenerator_x86_64 and > macroAssembler_x86_32), but also seems like it should be straightforward. > >> > The various MSVC-conditional direct uses of __declspec(align(N)) should probably currently be using ATTRIBUTE_ALIGNED. >> >> The instances of `__declspec(align())` changed here are in the native libraries written in C, not within HotSpot itself. From what I can see at least HotSpot never uses compiler alignment attributes directly and always strictly sticks to `ATTRIBUTE_ALIGNED` (which is probably a good thing) > > You are right that the Windows-conditionalized uses are in non-HotSpot code. > I missed that context when skimming through the changes. Since Visual Studio > is always C++ (even though the shared files are written as C), using alignas > with appropriate conditionalization in those files should be fine. Resolved, will address re-implementing ATTRIBUTE_ALIGNED with alignas in another Pull Request ------------- PR: https://git.openjdk.org/jdk/pull/11081 From stefan.johansson at oracle.com Wed Nov 30 08:37:50 2022 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 30 Nov 2022 09:37:50 +0100 Subject: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT) In-Reply-To: <a20d3253-47b6-41dc-b8e4-0b894137ed90@app.fastmail.com> References: <CAA-vtUzvyqb_LZ9d1Z80oSvzyqpi89053O3B+oDZF4oZ85CuZg@mail.gmail.com> <c5db2105-3b78-0b77-779f-011c8649f476@oracle.com> <CAA-vtUxrP49w523EQ3yKFh4Zo5W+HMtqpGcsUxOEMkwZUOQ0zQ@mail.gmail.com> <a20d3253-47b6-41dc-b8e4-0b894137ed90@app.fastmail.com> Message-ID: <4f7deee9-366d-05a2-9268-09a25a138d8d@oracle.com> Hi Carter, Your mail made me pick up an old item from my wishlist: to have native memory tracking information available in JFR recordings. When we, in GC, do improvements to decrease the native memory overhead of our algorithms, NMT is a very good tool to track the progress. We have scripts that sound very similar to what you describe and more than once I've been thinking about adding this information into JFR. But it has not been a priority and the greater value has been unclear. Hearing that others might also benefit from such a change I took a discussion with the JFR team on how to best proceed with this. I have created a branch for this and will probably create a PR for it shortly, but I thought I would drop it here first: https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt The change adds two new JFR events: one for the total usage and one for the usage of each memory type. These are sent only if Native Memory Tracking is turned on, and they are enabled in the default JFR profile with an interval of 1s. This might change during reviewing but it was a good starting point. With this you will be able to use JFR streaming to access the events from within your running process. I hope this will help your use cases and please let us know if you have any comments or suggestions. Thanks, Stefan On 2022-11-10 16:58, Carter Kozak wrote: > /+serviceability-dev/ > > Firstly, thank you both for your time and work in this space. Apologies > if this should be a separate thread, but the new title ?Extend Native > Memory Tracking over the JDK? aligns directly with some work I?ve been > investigating, and I hope my feedback will be helpful for prioritization > of zlib observability as well as the way users think about native memory > tracking in general. > > Observability of native memory in the JVM is critically important, and > becomes even more valuable as the industry shifts to more and smaller > services deployed in right-sized container environments like kubernetes. > Each new JDK release (major and hotfix) offers dramatic improvements, > often based on some form of trade-off. To be clear, I cannot overstate > how impressed I am with quality and velocity of improvement! However, > these trade-offs impact the way that memory is used, and it?s a > difficult balance to ensure containers use the correct amount of memory > without being wasteful (over-provisioned) or oomkilled (under-provisioned). > > In production, I have thousands of JVMs running with native memory > tracking summary enabled. Real-time monitoring of the output is painful > and inefficient. Currently the only supported option I?m aware of is > shelling out to create a new jcmd process and parsing the NMT summary > text output periodically. In older releases, it was possible to bypass > the jcmd process by self-attaching, but that was limited in jdk9 by > JDK-8178380 <https://bugs.openjdk.org/browse/JDK-8178380>, and still > required the caller to parse human-readable strings. In fact, attachment > issues in some JDK/environment combinations make automated attachment > /dangerous/ in a way that has crashed the JVM ? that may be a story for > another day, but my point is that simple, efficient NMT data collection > would go a very long way. Many modern observability tools, especially > those used in container deployments, operate by reading data from within > the jvm process, and relaying it to a storage system (Prometheus may be > the most ubiquitous example). > > For my use-case, I?d love to have a simple API I could invoke from java > code to access structured native-memory-tracking data, similar in a way > to MemoryPoolMXBean for heap pools (although JMX isn?t necessary for me, > it aligns with other observability APIs in the JDK). Additionally, I?d > like to provide JFR events to periodically record native memory tracking > metadata when enabled for better out-of-the-box experience with JMC. > I?ve begun investigating some options for JDK-8182634 > <https://bugs.openjdk.org/browse/JDK-8182634>, but would appreciate > feedback before I propose any sort of code change. > > Thank you all for beginning this discussion, I?m eager to see the ways > the JDK continues to improve upon observability features, and do what > small part I can to help! > > Carter Kozak From thartmann at openjdk.org Wed Nov 30 08:43:29 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Nov 2022 08:43:29 GMT Subject: RFR: 8297689: Fix incorrect result of Short.reverseBytes() call in loops In-Reply-To: <yv1aUlZMKpLVbXZmkSqZtWLlDTa-5PeX6pzzBu1Rrb8=.6ef681fe-6dbd-4dab-9ee0-61f59fbeb024@github.com> References: <yv1aUlZMKpLVbXZmkSqZtWLlDTa-5PeX6pzzBu1Rrb8=.6ef681fe-6dbd-4dab-9ee0-61f59fbeb024@github.com> Message-ID: <8pV4gvVPCq8hnGreDD3Ex80UjgkTqd1PgePPH6zAUqQ=.309fbc0b-3486-41d9-9681-220c9eb51f7e@github.com> On Wed, 30 Nov 2022 07:20:11 GMT, Pengfei Li <pli at openjdk.org> wrote: > Recently, we find calling `Short.reverseBytes()` in loops may generate incorrect result if the code is compiled by C2. Below is a simple case to reproduce. > > > class Foo { > static final int SIZE = 50; > static int a[] = new int[SIZE]; > > static void test() { > for (int i = 0; i < SIZE; i++) { > a[i] = Short.reverseBytes((short) a[i]); > } > } > > public static void main(String[] args) throws Exception { > Class.forName("java.lang.Short"); > a[25] = 16; > test(); > System.out.println(a[25]); > } > } > > // $ java -Xint Foo > // 4096 > // $ java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Foo.test Foo > // 268435456 > > > In this case, the `reverseBytes()` call is intrinsified and transformed into a `ReverseBytesS` node. But then C2 compiler incorrectly vectorizes it into `ReverseBytesV` with int type. C2 `Op_ReverseBytes*` has short, char, int and long versions. Their behaviors are different for different data sizes. In superword, subword operation itself doesn't have precise data size info. Instead, the data size info comes from memory operations in its use-def chain. Hence, vectorization of `reverseBytes()` is valid only if the data size is consistent with the type size of the caller's class. But current C2 compiler code lacks fine-grained type checks for `ReverseBytes*` in vector transformation. It results in `reverseBytes()` call from Short or Character class with int load/store gets vectorized incorrectly in above case. > > To fix the issue, this patch adds more checks in `VectorNode::opcode()`. T_BYTE is a special case for `Op_ReverseBytes*`. As the Java Byte class doesn't have `reverseBytes()` method so there's no `Op_ReverseBytesB`. But T_BYTE may still appear in VectorAPI calls. In this patch we still use `Op_ReverseBytesI` for T_BYTE to ensure vector intrinsification succeeds. > > Tested with hotspot::hotspot_all_no_apps, jdk tier1~3 and langtools tier1 on x86 and AArch64, no issue is found. This looks reasonable to me but I'm not an expert in that code. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/11427 From dlong at openjdk.org Wed Nov 30 08:56:22 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 08:56:22 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> Message-ID: <DYYb4M9EKHmGR3paCa0XQ0Q2IbX2HfFlIfgOCePWRGo=.f3844b3b-8e72-4a34-a6a4-96ff4d3ce77a@github.com> On Tue, 29 Nov 2022 07:34:08 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote: > Of course, if we are ever going to implement [JDK-8254110](https://bugs.openjdk.org/browse/JDK-8254110), we need to revisit that code, but in that case the assert that I added will trigger. > > What do you think? OK, please add a TODO in JDK-8254110 explaining this so it doesn't get lost. ------------- PR: https://git.openjdk.org/jdk/pull/11316 From shade at openjdk.org Wed Nov 30 09:08:16 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Nov 2022 09:08:16 GMT Subject: RFR: 8297600: Check current thread in selected JRT_LEAF methods [v2] In-Reply-To: <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> <N1mTZ0VCshuA_t4ifX4i6U1yhLGgGf0L99OK6vAhbmw=.5149a141-1567-46d6-af32-6449bcaa3c19@github.com> Message-ID: <EFZuFhFCuSAoy1ynwnKgMY3y2ZcVXkp6z2UuDXD4lcA=.436f11a2-47d0-4c8b-b03c-a158b0c71e35@github.com> On Mon, 28 Nov 2022 12:26:41 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: >> With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. >> >> We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_64 fastdebug `tier2` >> - [x] Linux x86_32 fastdebug `tier1` >> - [x] Linux x86_32 fastdebug `tier2` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Revert some additions Thanks! I am integrating then. ------------- PR: https://git.openjdk.org/jdk/pull/11359 From sjohanss at openjdk.org Wed Nov 30 09:08:52 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 30 Nov 2022 09:08:52 GMT Subject: RFR: 8297427: Avoid keeping class loaders alive when executing ClassLoaderStatsVMOperation [v2] In-Reply-To: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> References: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> Message-ID: <qVWmEYZDcXkHs5UN68wzWCFctsz_9zfg1XjdbG7FfKw=.1745c615-98ef-46cd-82df-590fd1673a1c@github.com> > Please review this change to avoid keeping classes alive only due to the `ClassLoaderStatsVMOperation`. > > **Summary** > The `ClassLoaderStatsVMOperation` is gathering statistics about the active class loaders in a safepoint. The way the `ClassLoaderDataGraph` is iterated will keep the class loaders live. This is not really needed since everything is done in a safepoint and nothing needs to be explicitly kept alive. This has not been a problem prior to concurrent class unloading in ZGC. With fully concurrent class unloading a `ClassLoaderStatsVMOperation` can occur during a collection and more classes than needed might be kept alive. This could in turn lead to premature Metaspace OOM. > > The solution is to not keep the class loaders alive due to the iteration in `ClassLoaderStatsVMOperation`. > > **Testing** > * Added a new test that covers the two different ways a class could previously be kept alive by the VM operation. The test passes after the fix but failed before. > * Mach5 tier 1-3 Stefan Johansson has updated the pull request incrementally with three additional commits since the last revision: - Print object to ensure it is kept alive - Revert "Axel comments to use templates" This reverts commit 8800ef089b62cf173147b68adb2ee993b7e72980. - Revert "Missing include for minimal" This reverts commit 424e0f9d831279ba2d1986ebacb499f8d4a6c078. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11300/files - new: https://git.openjdk.org/jdk/pull/11300/files/424e0f9d..f6dba7f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11300&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11300&range=00-01 Stats: 157 lines in 15 files changed: 72 ins; 69 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/11300.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11300/head:pull/11300 PR: https://git.openjdk.org/jdk/pull/11300 From sjohanss at openjdk.org Wed Nov 30 09:10:38 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 30 Nov 2022 09:10:38 GMT Subject: RFR: 8297427: Avoid keeping class loaders alive when executing ClassLoaderStatsVMOperation In-Reply-To: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> References: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> Message-ID: <wm9j6IU57X7C6mEapPXxesmzwrqY5P7_VzJGnO_K0JY=.538b64ee-d7e3-453b-858a-08c67f4a0ef8@github.com> On Tue, 22 Nov 2022 20:54:54 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote: > Please review this change to avoid keeping classes alive only due to the `ClassLoaderStatsVMOperation`. > > **Summary** > The `ClassLoaderStatsVMOperation` is gathering statistics about the active class loaders in a safepoint. The way the `ClassLoaderDataGraph` is iterated will keep the class loaders live. This is not really needed since everything is done in a safepoint and nothing needs to be explicitly kept alive. This has not been a problem prior to concurrent class unloading in ZGC. With fully concurrent class unloading a `ClassLoaderStatsVMOperation` can occur during a collection and more classes than needed might be kept alive. This could in turn lead to premature Metaspace OOM. > > The solution is to not keep the class loaders alive due to the iteration in `ClassLoaderStatsVMOperation`. > > **Testing** > * Added a new test that covers the two different ways a class could previously be kept alive by the VM operation. The test passes after the fix but failed before. > * Mach5 tier 1-3 Reverted the use of templates after some internal discussions. Also added a print of the object in the test to ensure the class is kept alive even if `-Xcomp` is used and ergonomically GCs are triggered before the one invoked by the test. ------------- PR: https://git.openjdk.org/jdk/pull/11300 From shade at openjdk.org Wed Nov 30 09:12:26 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Nov 2022 09:12:26 GMT Subject: Integrated: 8297600: Check current thread in selected JRT_LEAF methods In-Reply-To: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> References: <gOhrwb0CfIZRgsuj7Tv-7l40LSoglrVtB0opSEFeBAM=.47dc28ea-869a-4b12-b80a-8c8819134cf0@github.com> Message-ID: <4NaGNlP74men4NBqQkg5eRSEtjYo4RCeRM7hrSrTwCs=.0dce735b-71a7-4fc4-b0a9-2ee7fdf2d5d1@github.com> On Thu, 24 Nov 2022 19:23:29 GMT, Aleksey Shipilev <shade at openjdk.org> wrote: > With [JDK-8275286](https://bugs.openjdk.org/browse/JDK-8275286), we added the `Thread::current()` checks for most of the JRT entries. But `JRT_LEAF` is still not checked, because not every `JRT_LEAF` carries a `JavaThread` argument. Having assertions there helps for two reasons. First, these methods can be called from the stub/compiler code, which might be erroneous with thread handling (especially in x86_32 that does not have a dedicated thread register). Second, in the post-Loom world, current thread can change suddenly, as evidenced here: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-November/060779.html. > > We can add the thread checks to relevant `JRT_LEAF` methods that accept `JavaThread*` too. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` This pull request has now been integrated. Changeset: b3501fd1 Author: Aleksey Shipilev <shade at openjdk.org> URL: https://git.openjdk.org/jdk/commit/b3501fd11c59813515b46f80283e22b094c6e251 Stats: 19 lines in 7 files changed: 19 ins; 0 del; 0 mod 8297600: Check current thread in selected JRT_LEAF methods Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/11359 From fyang at openjdk.org Wed Nov 30 09:49:28 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 30 Nov 2022 09:49:28 GMT Subject: RFR: 8297763: Fix missing stub code expansion before align() in shared trampolines In-Reply-To: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> References: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> Message-ID: <fCD8mzJxfFzY9xPLgXuyRPgdRorhbJEIsbtxO1NDybs=.dbd2a775-edda-453d-8f12-f7c7f65a7c14@github.com> On Tue, 29 Nov 2022 13:43:20 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote: > This patch fixes missing stub code expansion logic before `align()` for AArch64 and RISC-V. > > The `align()` at most creates 4-byte padding, so a `NativeInstruction::instruction_size` is enough. > > I am considering pre-calculating the total trampoline sizes and allocating them in batches, but maybe after this one, for this is a quick fix to unblock https://github.com/openjdk/jdk/pull/11188. Please see that thread. > > The `assert_alignment(pc());` added in the RISC-V part shows that RVC doesn't change the trampoline stub / static stub logic, so there is no need to adjust the trampoline size for it. [1] > > Tested AArch64 hotspot tier1~3, and 4 is still running; tested RISC-V hotspot tier1~2, and 3~4 are still running. > > Thanks, > Xiaolin > > [1] https://github.com/openjdk/jdk/blob/2deb318c9f047ec5a4b160d66a4b52f93688ec42/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3125-L3126 src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 55: > 53: auto emit = [&](address dest, const CodeBuffer::Offsets &offsets) { > 54: masm.set_code_section(cb->stubs()); > 55: if (cb->stubs()->maybe_expand_to_ensure_remaining(NativeInstruction::instruction_size) && cb->blob() == NULL) { Shoud we add a check for the real code alignment here and put maybe_expand_to_ensure_remaining() and masm.align(wordSize) operations under that check? ------------- PR: https://git.openjdk.org/jdk/pull/11414 From alanb at openjdk.org Wed Nov 30 09:51:26 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 09:51:26 GMT Subject: RFR: 8296710: Update to use jtreg 7.1 In-Reply-To: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> References: <4t-uTHoUVlflpzDnfuHO7TAnix7nNO1MGF28CJBjZBo=.deb056ac-6571-4c79-a272-680b923d58c5@github.com> Message-ID: <-n11nKKODtr-Mua5vvwmAVEvevjYu2pv3iSzs7tIzzU=.2c8711c8-daab-42b4-a186-4c3f71b4bc73@github.com> On Tue, 29 Nov 2022 14:44:12 GMT, Christian Stein <cstein at openjdk.org> wrote: > Please review the change to update to using jtreg `7.1`. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the requiredVersion has been updated in the various `TEST.ROOT` files. > > This pull request was created by copying the following and using `7.1` at appropriate places: > - https://github.com/openjdk/jdk/pull/9393 Christian has confirmed on the testing so I think this is good. ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.org/jdk/pull/11416 From lkorinth at openjdk.org Wed Nov 30 09:55:27 2022 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 30 Nov 2022 09:55:27 GMT Subject: RFR: 8287400: Make BitMap range parameter names consistent [v2] In-Reply-To: <6MIyyyl41Wje3Zsvzijy7LQzsZXyvuG5G-vHSttHMho=.1bbd25b3-6ca6-4735-9666-c360329ae2cf@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> <6MIyyyl41Wje3Zsvzijy7LQzsZXyvuG5G-vHSttHMho=.1bbd25b3-6ca6-4735-9666-c360329ae2cf@github.com> Message-ID: <uy6oE3XcckbftwUJZZ-QGNCITMn99OTIhnIJaupWWqQ=.4797815b-52de-4ab2-8d5a-7ec5cdbd6a2b@github.com> On Tue, 29 Nov 2022 12:10:11 GMT, Afshin Zafari <duke at openjdk.org> wrote: >> The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. > > Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: > > - 8287400: Make BitMap range parameter names consistent > - Revert "8287400: Make BitMap range parameter names consistent" > > This reverts commit 170f75aab91b3299c0be0f38c321d3025aeba7e8. Approved. Personally I think I would have preferred `start` as it is short and you do not need to abbreviate to `beg` but this looks good and is a nice cleanup. Thanks! ------------- Marked as reviewed by lkorinth (Reviewer). PR: https://git.openjdk.org/jdk/pull/11375 From sspitsyn at openjdk.org Wed Nov 30 10:05:17 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 30 Nov 2022 10:05:17 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files [v2] In-Reply-To: <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> Message-ID: <szrio5u7SvYddtEpaEjn9OxQJ84vSOb3St-0bwjCKXQ=.2d8177f0-6d61-44d0-a811-f578082bddaa@github.com> On Tue, 29 Nov 2022 09:15:20 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: >> Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. >> >> This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. > > Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: > > Address David's review suggestion Good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11386 From sspitsyn at openjdk.org Wed Nov 30 10:13:02 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 30 Nov 2022 10:13:02 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing [v2] In-Reply-To: <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> Message-ID: <Y_6tgPOBZFI6ZnaP0yuO2CLOgxIY-HMe-hgYgz8Jv88=.58c78561-246b-4e66-bb3b-1a6decd5ce96@github.com> On Mon, 28 Nov 2022 11:52:22 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. >> Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by sspitsyn (Reviewer). Thank you for the update. I agree with David that it is not very clear when it is needed. Approving anyway. ------------- PR: https://git.openjdk.org/jdk/pull/11238 From thartmann at openjdk.org Wed Nov 30 10:13:19 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Nov 2022 10:13:19 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <DYYb4M9EKHmGR3paCa0XQ0Q2IbX2HfFlIfgOCePWRGo=.f3844b3b-8e72-4a34-a6a4-96ff4d3ce77a@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> <DYYb4M9EKHmGR3paCa0XQ0Q2IbX2HfFlIfgOCePWRGo=.f3844b3b-8e72-4a34-a6a4-96ff4d3ce77a@github.com> Message-ID: <uqg-Rppen2pXRMJtwERNa5MA6tZ2ycOxXEVtgzNjaDU=.3c1792b2-679a-4217-b87d-c0085aadf54d@github.com> On Wed, 30 Nov 2022 08:54:04 GMT, Dean Long <dlong at openjdk.org> wrote: > OK, please add a TODO in JDK-8254110 explaining this so it doesn't get lost. Thanks, done. ------------- PR: https://git.openjdk.org/jdk/pull/11316 From eosterlund at openjdk.org Wed Nov 30 10:19:23 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 30 Nov 2022 10:19:23 GMT Subject: RFR: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing [v2] In-Reply-To: <Y_6tgPOBZFI6ZnaP0yuO2CLOgxIY-HMe-hgYgz8Jv88=.58c78561-246b-4e66-bb3b-1a6decd5ce96@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> <Z3p1R4bbI5blRkjjklVqHrqGfgy7lCmM5JPgbq67im0=.36d7cede-17a5-442d-8dfc-9c4e8afece2b@github.com> <Y_6tgPOBZFI6ZnaP0yuO2CLOgxIY-HMe-hgYgz8Jv88=.58c78561-246b-4e66-bb3b-1a6decd5ce96@github.com> Message-ID: <_o8drBhIADNppFf5GkgzB7DMC-H7ie1-Z2Nu_x-kWKA=.59a3206c-1675-4a44-bfaa-5351fb582649@github.com> On Wed, 30 Nov 2022 10:09:14 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > Thank you for the update. I agree with David that it is not very clear when it is needed. Approving anyway. Thank you for the reviews, @sspitsyn and @dholmes-ora! ------------- PR: https://git.openjdk.org/jdk/pull/11238 From aph at openjdk.org Wed Nov 30 10:20:27 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 10:20:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <bSqaxaEIoBnFzpZSkO_GEgh-3Z1zDVCKuOujzvYoN3g=.852b9244-7101-4431-9d5c-b10d25290a54@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <bSqaxaEIoBnFzpZSkO_GEgh-3Z1zDVCKuOujzvYoN3g=.852b9244-7101-4431-9d5c-b10d25290a54@github.com> Message-ID: <5fa9i8vgmsZ-zBtja1VM7fEFBWbrP2dnDmEiolivRq0=.b9a33e09-1ec0-44a5-af3f-7cd12295ba72@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley <aph at openjdk.org> wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic I ain't dead yet. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From sspitsyn at openjdk.org Wed Nov 30 10:23:31 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 30 Nov 2022 10:23:31 GMT Subject: RFR: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests [v3] In-Reply-To: <fSDug5Tre90s5oTZ9YHGnYJpDSbqPH5O9ZsElYRO17E=.399cc8de-7fe8-4bdf-ae04-f46648ee15aa@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> <fSDug5Tre90s5oTZ9YHGnYJpDSbqPH5O9ZsElYRO17E=.399cc8de-7fe8-4bdf-ae04-f46648ee15aa@github.com> Message-ID: <LebBU0uvURQy-tbswbusM8SI0a7ol5z3Ew6R1M4ciQo=.d9a84e36-f5c9-4ff1-95b4-c0da36c8d991@github.com> On Tue, 29 Nov 2022 08:42:29 GMT, Alex Menkov <amenkov at openjdk.org> wrote: >> The fix combines almost the same tests to 1 test to remove code duplication > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Fixed comments LGTM. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11400 From kbarrett at openjdk.org Wed Nov 30 11:22:31 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 11:22:31 GMT Subject: RFR: 8297830: aarch64: Make Address a descriminated union internally Message-ID: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> Please review this change to the aarch64 Address class. It now uses an internal union, separating the literal and nonliteral cases. This avoids leaving some fields uninitialized or initializing them to dummy values. It also reduces the size of the Address class somewhat, though it's unclear whether that makes any noticeable difference. Testing: mach5 tier1 for linux-aarch64 and macosx-aarch64 mach5 tier2-5 for linux-aarch64 ------------- Commit messages: - unionize Changes: https://git.openjdk.org/jdk/pull/11429/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11429&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297830 Stats: 205 lines in 3 files changed: 110 ins; 16 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/11429.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11429/head:pull/11429 PR: https://git.openjdk.org/jdk/pull/11429 From jpai at openjdk.org Wed Nov 30 11:28:18 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 30 Nov 2022 11:28:18 GMT Subject: RFR: 8297693: Fix typos in src/hotspot and test/hotspot files [v2] In-Reply-To: <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> <-lbBaUaP4j0boRFXIztrwuaki5AQ5Z-6RhCMSsoHKF0=.026f7ce8-2118-4c26-b629-40adbcf8ce30@github.com> Message-ID: <mIYyGWK3vH5dhdtdMf7NpU85h12EQ8kxJAUrRAUmZgM=.1a8e4ca7-ac7d-4e11-ae09-0d464e84a6ec@github.com> On Tue, 29 Nov 2022 09:15:20 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: >> Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. >> >> This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. > > Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: > > Address David's review suggestion Thank you everyone for the reviews. ------------- PR: https://git.openjdk.org/jdk/pull/11386 From jpai at openjdk.org Wed Nov 30 11:32:12 2022 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 30 Nov 2022 11:32:12 GMT Subject: Integrated: 8297693: Fix typos in src/hotspot and test/hotspot files In-Reply-To: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> References: <W1cFAGR0hObJhu9SGvzFnXOV37zwgZ6Tv3bwJLJNH2s=.54b5cae7-a24d-436e-8c0b-eae22d0fe2fd@github.com> Message-ID: <LipjxWLtGcdN8jNx8-chrs-DlwePgU3rBc6WHnhBfg4=.6f9e167f-88af-442a-9ca8-00766d7427ea@github.com> On Mon, 28 Nov 2022 09:51:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote: > Can I please get a review for this change which only fixes typos in src/hotspot and test/hotspot files? These changes were originally done by @mernst in PR https://github.com/openjdk/jdk/pull/10029, but given that the other PR touches multiple other files and areas, the progress was stalled. > > This PR introduces only hotspot related typo fixes that Michael had proposed in the other PR plus also includes a review suggestion (in one of these files) that was made by Alexey in the other PR. This pull request has now been integrated. Changeset: 3f8882b2 Author: Jaikiran Pai <jpai at openjdk.org> URL: https://git.openjdk.org/jdk/commit/3f8882b2ebeeb25fbfddc1be3a069181856c2e27 Stats: 15 lines in 12 files changed: 0 ins; 0 del; 15 mod 8297693: Fix typos in src/hotspot and test/hotspot files Co-authored-by: Michael Ernst <mernst at openjdk.org> Reviewed-by: kevinw, dholmes, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/11386 From xlinzheng at openjdk.org Wed Nov 30 11:37:58 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 30 Nov 2022 11:37:58 GMT Subject: RFR: 8297763: Fix missing stub code expansion before align() in shared trampolines [v2] In-Reply-To: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> References: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> Message-ID: <QOm1rf2yvHIVvNhyiPG97YQOX-_5j1VjLoYKbl20lEQ=.98674f31-37eb-4967-bcfe-caad03b5d9ba@github.com> > This patch fixes missing stub code expansion logic before `align()` for AArch64 and RISC-V. > > The `align()` at most creates 4-byte padding, so a `NativeInstruction::instruction_size` is enough. > > I am considering pre-calculating the total trampoline sizes and allocating them in batches, but maybe after this one, for this is a quick fix to unblock https://github.com/openjdk/jdk/pull/11188. Please see that thread. > > The `assert_alignment(pc());` added in the RISC-V part shows that RVC doesn't change the trampoline stub / static stub logic, so there is no need to adjust the trampoline size for it. [1] > > Tested AArch64 hotspot tier1~3, and 4 is still running; tested RISC-V hotspot tier1~2, and 3~4 are still running. > > Thanks, > Xiaolin > > [1] https://github.com/openjdk/jdk/blob/2deb318c9f047ec5a4b160d66a4b52f93688ec42/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3125-L3126 Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Fix as to comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11414/files - new: https://git.openjdk.org/jdk/pull/11414/files/0c031a78..cf0d0372 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11414&range=00-01 Stats: 16 lines in 2 files changed: 6 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/11414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11414/head:pull/11414 PR: https://git.openjdk.org/jdk/pull/11414 From xlinzheng at openjdk.org Wed Nov 30 11:39:33 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 30 Nov 2022 11:39:33 GMT Subject: RFR: 8297763: Fix missing stub code expansion before align() in shared trampolines [v2] In-Reply-To: <fCD8mzJxfFzY9xPLgXuyRPgdRorhbJEIsbtxO1NDybs=.dbd2a775-edda-453d-8f12-f7c7f65a7c14@github.com> References: <T5fCT6B-2hhjmlMaRuu5iHfMI-gxkuFJp3jtjGh7ZtM=.719d411d-da4f-482b-af14-c9caee93b865@github.com> <fCD8mzJxfFzY9xPLgXuyRPgdRorhbJEIsbtxO1NDybs=.dbd2a775-edda-453d-8f12-f7c7f65a7c14@github.com> Message-ID: <YoyS7mVGLkJv9ZWxyp1AjRyzG3tLVeU18cPtKGI6DJk=.80686781-7427-49f8-acfc-327548e25f7e@github.com> On Wed, 30 Nov 2022 09:45:22 GMT, Fei Yang <fyang at openjdk.org> wrote: >> Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix as to comments > > src/hotspot/cpu/aarch64/codeBuffer_aarch64.cpp line 55: > >> 53: auto emit = [&](address dest, const CodeBuffer::Offsets &offsets) { >> 54: masm.set_code_section(cb->stubs()); >> 55: if (cb->stubs()->maybe_expand_to_ensure_remaining(NativeInstruction::instruction_size) && cb->blob() == NULL) { > > Shoud we add a check for the real code alignment here and put maybe_expand_to_ensure_remaining() and masm.align(wordSize) operations under that check? Thanks for the suggestion - yes, indeed. ------------- PR: https://git.openjdk.org/jdk/pull/11414 From stefank at openjdk.org Wed Nov 30 12:01:20 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 30 Nov 2022 12:01:20 GMT Subject: RFR: 8297427: Avoid keeping class loaders alive when executing ClassLoaderStatsVMOperation [v2] In-Reply-To: <qVWmEYZDcXkHs5UN68wzWCFctsz_9zfg1XjdbG7FfKw=.1745c615-98ef-46cd-82df-590fd1673a1c@github.com> References: <YdNWoA79MYjB7s-lcu1eYljYAGoV46gjbnYqX0PzJGc=.93d62d9d-0642-4a0d-9f90-b8b1885d4ace@github.com> <qVWmEYZDcXkHs5UN68wzWCFctsz_9zfg1XjdbG7FfKw=.1745c615-98ef-46cd-82df-590fd1673a1c@github.com> Message-ID: <CCoJkGjFXbEqKjIxajrTSusu76Q6MdUKmofvRzFMCJA=.f46ef994-6a96-4ff3-a15d-bd401889ddeb@github.com> On Wed, 30 Nov 2022 09:08:52 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote: >> Please review this change to avoid keeping classes alive only due to the `ClassLoaderStatsVMOperation`. >> >> **Summary** >> The `ClassLoaderStatsVMOperation` is gathering statistics about the active class loaders in a safepoint. The way the `ClassLoaderDataGraph` is iterated will keep the class loaders live. This is not really needed since everything is done in a safepoint and nothing needs to be explicitly kept alive. This has not been a problem prior to concurrent class unloading in ZGC. With fully concurrent class unloading a `ClassLoaderStatsVMOperation` can occur during a collection and more classes than needed might be kept alive. This could in turn lead to premature Metaspace OOM. >> >> The solution is to not keep the class loaders alive due to the iteration in `ClassLoaderStatsVMOperation`. >> >> **Testing** >> * Added a new test that covers the two different ways a class could previously be kept alive by the VM operation. The test passes after the fix but failed before. >> * Mach5 tier 1-3 > > Stefan Johansson has updated the pull request incrementally with three additional commits since the last revision: > > - Print object to ensure it is kept alive > - Revert "Axel comments to use templates" > > This reverts commit 8800ef089b62cf173147b68adb2ee993b7e72980. > - Revert "Missing include for minimal" > > This reverts commit 424e0f9d831279ba2d1986ebacb499f8d4a6c078. Looks good! ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/11300 From mcimadamore at openjdk.org Wed Nov 30 12:30:50 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 12:30:50 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v33] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <G74SrqEb7pdRn_g1aTtWKrfhvY-Cn6ikXembTCHRcyg=.3e68ccfc-7a98-481e-8e76-cf36704e39a8@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Polish javadoc: * Make sure that first para of class javadoc is succinct and descriptive * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. * Other minor typos fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/6699ad99..5a75118b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=31-32 Stats: 59 lines in 10 files changed: 19 ins; 18 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From jwaters at openjdk.org Wed Nov 30 12:47:33 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 12:47:33 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas Message-ID: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the toolchain specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. ([Addressed in 8252584](https://github.com/openjdk/jdk/pull/11404#issuecomment-1331751901)) ------------- Commit messages: - alignas Changes: https://git.openjdk.org/jdk/pull/11431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8250269 Stats: 8 lines in 3 files changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11431.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11431/head:pull/11431 PR: https://git.openjdk.org/jdk/pull/11431 From thartmann at openjdk.org Wed Nov 30 13:51:17 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Nov 2022 13:51:17 GMT Subject: RFR: 8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM [v2] In-Reply-To: <lIGuVSmjDi39UeApfZ2ZR2DyeFRABEJYJ4WAFLmCdew=.1f446fb3-8daa-429b-9838-5822287758b3@github.com> References: <FGpni8mDUm5PEVCRqcUg9Fn4AskWedWRL6TreaHahFU=.58f3f49f-23b9-4d67-a0eb-54139a6be675@github.com> <3mq_1qHPFETjQgo4uOX9npzgRzpzvKfCDI2hcWm40zc=.a866444d-223f-4329-bbbe-23008328958e@github.com> <_3SkZlJhWP_RHEgQDrTfSPp9pv8sS8jM4ewrIhucIcg=.a256713b-1f9a-45a7-a13f-e101ceca297b@github.com> <ilh-e9IZV-lFMQEq3rMbOxX927SNc75syjfkmazMBEI=.a90a961d-0494-491e-9020-a6952501cd17@github.com> <YWy2NTRtowyJjRBmh4_7OrxDDfX9ElyDVEnrH-q7eG0=.fb141393-4137-431c-8f4b-677e817d4ce1@github.com> <lIGuVSmjDi39UeApfZ2ZR2DyeFRABEJYJ4WAFLmCdew=.1f446fb3-8daa-429b-9838-5822287758b3@github.com> Message-ID: <xDOh2UilUSaL6MUwIQb5tenJHmAm9Ky2tAq_VGTiYIY=.315df2c9-2fab-4c04-9275-b4c0de0defc0@github.com> On Wed, 30 Nov 2022 06:28:08 GMT, David Holmes <dholmes at openjdk.org> wrote: >>> I'm not sure if it's guaranteed that replay runs single-threaded. I haven't looked into the details, but I think ReplayInline can run in any compiler thread after startup, and then of course there is JDK-8254110. >> >> But ReplayInline does not load ciMethodData, i.e., does not call `process_ciMethodData`, right? >> >> The only way this code can get executed is through `JNI_CreateJavaVM_inner -> ciReplay::replay -> ciReplay::replay_impl -> process-> process_command -> process_ciMethodData` and that only ever happens single-threaded. >> >> Of course, if we are ever going to implement [JDK-8254110](https://bugs.openjdk.org/browse/JDK-8254110), we need to revisit that code, but in that case the assert that I added will trigger. >> >> What do you think? >> >>> Method::build_method_counters doesn't appear to be lock-free, in the sense there are no atomic operations; rather concurrency just doesn't seem to be a concern with that code. ?? >> >> It uses the exact same mechanism that I now added to `Method::build_profiling_method_data`, including atomic operations to initialize `_method_counters`: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L651-L654 >> >> And I verified that multiple threads attempt to initialize the counters by adding verification code to: >> https://github.com/openjdk/jdk/blob/f4b5065c37e86f4b2ca26da6ce678febe4a52950/src/hotspot/share/oops/method.cpp#L644-L646 > >> It uses the exact same mechanism that I now added to Method::build_profiling_method_data, including atomic operations to initialize _method_counters > > Sorry I somehow missed the `init_method_counters` logic. :( @dholmes-ora, @dean-long are you okay with the latest version? ------------- PR: https://git.openjdk.org/jdk/pull/11316 From eosterlund at openjdk.org Wed Nov 30 14:11:39 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 30 Nov 2022 14:11:39 GMT Subject: RFR: 8296875: Generational ZGC: Refactor loom code [v6] In-Reply-To: <_P0NXex3w0yz8V-4FXZdTKT4Jt_eskqOYRykIoWqVrI=.4553b958-0425-460c-be29-a1ce884d0f27@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> <_P0NXex3w0yz8V-4FXZdTKT4Jt_eskqOYRykIoWqVrI=.4553b958-0425-460c-be29-a1ce884d0f27@github.com> Message-ID: <aiz8hKx8kBgKftlH0gwjqW0fz_1d7Z8XIj56prli2jM=.3d3127fb-2520-4d6b-80da-040e966ca3d6@github.com> On Mon, 28 Nov 2022 15:49:30 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: >> The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. >> >> In particular, >> 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. >> >> 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. >> >> 3) Refactoring the stack chunk allocation code >> >> Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - PPC support > - Merge branch 'master' into 8296875_refactor_loom_code > - Patricio concerns > - Fix Richard comments > - Indentation fix > - Fix verification and RISC-V support > - Generational ZGC: Loom support Thanks for the reviews @coleenp and @TheRealMDoerr! ------------- PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Wed Nov 30 14:11:39 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 30 Nov 2022 14:11:39 GMT Subject: Integrated: 8296875: Generational ZGC: Refactor loom code In-Reply-To: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> References: <2o2G0DQuCzMxGA0hq148c5E5ysEXUTKf9ymWsa7emOc=.35fa21f1-374e-4d0b-9619-68c81ac89301@github.com> Message-ID: <ZHQINSjaFkRifJEDKOUsvwded-ARaX1nYFakhgoKj7M=.7f61aa67-a0c0-4979-a9ec-cc6a45177879@github.com> On Fri, 11 Nov 2022 16:16:18 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > The current loom code makes some assumptions about GC that will not work with generational ZGC. We should make this code more GC agnostic, and provide a better interface for talking to the GC. > > In particular, > 1) All GCs have a way of encoding oops inside of the heap differently to oops outside of the heap. For non-ZGC collectors, that is compressed oops. For ZGC, that is colored pointers. With generational ZGC, pointers on-heap will be colored and pointers off-heap will be "colorless". So we need to generalize encoding and decoding of oops in the heap, for loom. > > 2) The cont_oop is located on a stack. In order to access it we need to start_processing on that thread, if it isn't the current thread. This happened to work so far for ZGC, because the stale pointers had enough colors. But with generational ZGC, these on-stack oops will be colorless, so we have to be more accurate here and ensure processing really has started on any thread that cont_oop is used on. To make life a bit easier, I'm moving the oop processing responsibility for these oops to the thread instead. Currently there is no more than one of these, so doing it lazily per frame seems a bit overkill. > > 3) Refactoring the stack chunk allocation code > > Tested with tier1-5 and manually running Skynet. No regressions detected. We have also been running with this (yet a slightly different backend) in the generational ZGC repo for a while now. This pull request has now been integrated. Changeset: be99e84c Author: Erik ?sterlund <eosterlund at openjdk.org> URL: https://git.openjdk.org/jdk/commit/be99e84c98786ff9c2c9ca1a979dc17ba810ae09 Stats: 978 lines in 42 files changed: 641 ins; 228 del; 109 mod 8296875: Generational ZGC: Refactor loom code Co-authored-by: Stefan Karlsson <stefank at openjdk.org> Co-authored-by: Axel Boldt-Christmas <aboldtch at openjdk.org> Reviewed-by: stefank, rrich, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/11111 From eosterlund at openjdk.org Wed Nov 30 14:17:38 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 30 Nov 2022 14:17:38 GMT Subject: Integrated: 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing In-Reply-To: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> References: <fB7q8zNvQWx0nDWbT1xNLMmUaBncVU6iWT5TO4tHUDo=.af279f42-23ce-4c6e-ae9f-13bf4b898df4@github.com> Message-ID: <_Doo0JxrTmsYBEEb8o8TGbQBwsaunDBBiIhx44LYoHM=.41592c3d-715a-48e7-8651-6957903edc4f@github.com> On Fri, 18 Nov 2022 12:30:19 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote: > There is a stack walk in JvmtiExport::post_exception_throw() that has safepoints in it. This trips up the stack watermark code. This patch adds a RAII object to JvmtiExport::post_exception_throw() that keeps the thread and its stack fully processed throughout the function. > Testing: tier1-7 of ZGC tests on linux x86_64 debug and manual testing of the test that failed. This pull request has now been integrated. Changeset: be4245e8 Author: Erik ?sterlund <eosterlund at openjdk.org> URL: https://git.openjdk.org/jdk/commit/be4245e814cc29701cc425d8e66854e36eb3aef0 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing Reviewed-by: pchilanomate, sspitsyn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/11238 From kbarrett at openjdk.org Wed Nov 30 14:36:32 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 14:36:32 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> Message-ID: <K_Zeb6O6NaqecfB9FGU-LE-FFhqM8JOCO3UHK_uyjZM=.5cedb73b-3077-47d9-aab0-892c6b4d3740@github.com> On Wed, 30 Nov 2022 07:26:46 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - HotSpot Style Guide changes This change should not have been integrated yet. Changes to the Style Guide have an approval process that is described in that document, including a 2 week voting period and final approval by the HotSpot lead. ------------- PR: https://git.openjdk.org/jdk/pull/11404 From jwaters at openjdk.org Wed Nov 30 14:57:35 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 14:57:35 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> Message-ID: <03SfvVDuCNMAnuuZg0BVsDhDkbLufbkEC69WQPNJ3SQ=.432ccf2b-ce8f-40bc-a3a5-1f28809839a1@github.com> On Wed, 30 Nov 2022 07:26:46 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - HotSpot Style Guide changes Oops, I did not realize that Temporarily back this out for now? I can (hopefully) do that rather quickly ------------- PR: https://git.openjdk.org/jdk/pull/11404 From jwaters at openjdk.org Wed Nov 30 15:04:27 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 15:04:27 GMT Subject: RFR: 8252584: HotSpot Style Guide should permit alignas [v2] In-Reply-To: <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> References: <CVHNRNKRV_f2n8F5s0AiQR7lgIrHBHzaUj5ewTDMU7I=.9920de89-7b18-4f22-9ca8-94479d4a292f@github.com> <q-_icded_J6Bu_FntBn_NzYa-o9ZplD4Mr5FmvsZYe4=.eedc699c-a423-4bf6-b97f-ea3367d6ec9d@github.com> Message-ID: <gQi726n-SmTuCluAy6HDFDnK_kRKU7cD8JDQIpA3uNg=.0fbc6037-8424-4b9a-a7bc-b1f60a81393e@github.com> On Wed, 30 Nov 2022 07:26:46 GMT, Julian Waters <jwaters at openjdk.org> wrote: >> Add alignas to the permitted features set with some restrictions. (Thanks @kimbarrett for the help) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - HotSpot Style Guide changes I've created an "emergency" revert at https://github.com/openjdk/jdk/pull/11433, sorry for the mistake ------------- PR: https://git.openjdk.org/jdk/pull/11404 From jwaters at openjdk.org Wed Nov 30 15:11:21 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Nov 2022 15:11:21 GMT Subject: RFR: 8297852: Backout 8252584 for the time being Message-ID: <9rSgaMOb56fcYoRPJX6rgi1jcvzxQXsMqAod76PmF4s=.97ad88c3-1250-4aea-a1f9-1c78987a3ad6@github.com> Revert 8252584 temporarily (My mistake) ------------- Commit messages: - Emergency Revert of 8252584 Changes: https://git.openjdk.org/jdk/pull/11433/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11433&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297852 Stats: 94 lines in 2 files changed: 0 ins; 92 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11433.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11433/head:pull/11433 PR: https://git.openjdk.org/jdk/pull/11433 From mcimadamore at openjdk.org Wed Nov 30 15:14:26 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 15:14:26 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v34] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <HN_lq5U4JCG7wQCRjwb8qP9gYU-GWDgEEyBYpaNo33Y=.e93cc490-29eb-4ce0-af54-1b6f130d1ccf@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/5a75118b..ce85d182 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=32-33 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From luhenry at openjdk.org Wed Nov 30 15:14:35 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 30 Nov 2022 15:14:35 GMT Subject: RFR: 8297697: RISC-V: Add support for SATP mode detection [v3] In-Reply-To: <lEyL9get5mVAM3E9yw4hf-uSHGD45a6P6CfwM10KIn4=.ef4f6a41-a7da-4e49-8193-11caec0c244b@github.com> References: <Xji-rRKv8AIaIZNkWPLycJo5RR1AsCXbxcf3fz9vrjM=.2d30a970-4ff4-449c-bd47-1443b59d2120@github.com> <lEyL9get5mVAM3E9yw4hf-uSHGD45a6P6CfwM10KIn4=.ef4f6a41-a7da-4e49-8193-11caec0c244b@github.com> Message-ID: <19LRRu8SCc6Qoar19Xr1XvA17ZIPehN12r4lVohh4G4=.833efe6b-c298-4875-af3d-74c95ef447eb@github.com> On Tue, 29 Nov 2022 14:27:02 GMT, Feilong Jiang <fjiang at openjdk.org> wrote: >> RISC-V gets sv57-based virtual memory support since Linux 5.18 [1]. There are some reports of the OpenJDK RISC-V port crashing on Linux 5.18+ with QEMU-system 7.10+ when sv57 was enabled [2][3] as currently RISC-V port only supports up to sv48. >> As discussed in [3], given the fact that there are no existing boards or hardware even support anything more than sv48, >> we decide to add detection for SATP (Supervisor Address Translation and Protection) mode at JVM startup time if possible and explicitly issue a warning and stop early when sv57 is enabled. >> >> When sv57 is enabled, the output of java -version would be: >> >> >> root at qemuriscv64:~# jdk/bin/java -version >> Error occurred during initialization of VM >> Unsupported satp mode: sv57 >> >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5b537b0ecc16992577b013f11112d54c7ce869 >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000639.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-November/000681.html >> >> Testing: >> >> - QEMU-system with sv48/sv57-enabled Linux image `-version` test >> - HiFive Unmatched board (sv39) `-version` test > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > remove sv32 Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.org/jdk/pull/11388 From mcimadamore at openjdk.org Wed Nov 30 15:30:40 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 15:30:40 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: - Merge branch 'master' into PR_20 - Address review comment - Polish javadoc: * Make sure that first para of class javadoc is succinct and descriptive * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. * Other minor typos fixes - Address review comments - * remove unused Scoped interface * re-add trusting of final fields in layout class implementations * Fix BulkOps benchmark, which had alignment issues - Fix bit vs. byte mismatch in test - Fix wrong check in MemorySegment::spliterator/elements (The check which ensures that the segment size is multiple of spliterator element size is bogus) - Address more review comments - Fix bad @throws in MemorySegment::copy methods - Address review comments - ... and 57 more: https://git.openjdk.org/jdk/compare/d0d99ae1...8668fb39 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/ce85d182..8668fb39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=33-34 Stats: 65983 lines in 1282 files changed: 30320 ins; 21180 del; 14483 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From aph at openjdk.org Wed Nov 30 15:32:23 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 15:32:23 GMT Subject: RFR: 8297830: aarch64: Make Address a descriminated union internally In-Reply-To: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> References: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> Message-ID: <in_YbHtsyqlgu8Z4BR0D9U-cGYvuTvYDDCL_0WwP15s=.42fe7b60-592b-4f94-babc-249dd5761e0e@github.com> On Wed, 30 Nov 2022 11:13:35 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change to the aarch64 Address class. It now uses an > internal union, separating the literal and nonliteral cases. > > This avoids leaving some fields uninitialized or initializing them to dummy > values. It also reduces the size of the Address class somewhat, though it's > unclear whether that makes any noticeable difference. > > Testing: > mach5 tier1 for linux-aarch64 and macosx-aarch64 > mach5 tier2-5 for linux-aarch64 OK, thanks. It does seem to me a bit like moving the furniture around, but I have to admit it's an improvement. :-) ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/11429 From alanb at openjdk.org Wed Nov 30 16:41:16 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 16:41:16 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> Message-ID: <ifYkXDEMXNqgWiktjDfPbS1dFLMkDosUfVrWCzzfobc=.d43429a8-82aa-4439-a6d9-60eafdc5c9b8@github.com> On Wed, 30 Nov 2022 15:30:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Address review comment > - Polish javadoc: > * Make sure that first para of class javadoc is succinct and descriptive > * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) > * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. > * Other minor typos fixes > - Address review comments > - * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues > - Fix bit vs. byte mismatch in test > - Fix wrong check in MemorySegment::spliterator/elements > (The check which ensures that the segment size is multiple of spliterator element size is bogus) > - Address more review comments > - Fix bad @throws in MemorySegment::copy methods > - Address review comments > - ... and 57 more: https://git.openjdk.org/jdk/compare/4c9f206a...8668fb39 src/java.base/share/classes/java/nio/channels/FileChannel.java line 1004: > 1002: * Maps a region of this channel's file into a new mapped memory segment, with the given offset, > 1003: * size and memory session. The {@linkplain MemorySegment#address() address} of the returned memory segment > 1004: * is the starting address of the mapped off-heap region backing the segment. Would you mind reflowing this paragraph to that the line lengths are a bit more consistent with the paragraphs that follow? That would also help with side-by-side views when looking at changes. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From duke at openjdk.org Wed Nov 30 16:43:06 2022 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 30 Nov 2022 16:43:06 GMT Subject: Integrated: 8287400: Make BitMap range parameter names consistent In-Reply-To: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> References: <bZjGIhpS8we3xd3Vk8TOqsR7l8DBIUhxCnRV5bn6xTw=.52c6752d-d391-4594-b93b-cf52d4df8431@github.com> Message-ID: <qT9G6yUs6SK2tZw_K6dJdtyAa54esfN8K2pKFYFsGAc=.e3b05bbc-e106-4e04-9163-5492688dc604@github.com> On Sat, 26 Nov 2022 22:16:34 GMT, Afshin Zafari <duke at openjdk.org> wrote: > The ranges are determined by 'start' and 'end' all over the bitMap.hpp and bitMap.cpp. All instances of other names for start and end are replaced. This pull request has now been integrated. Changeset: dcf431db Author: Afshin Zafari <Afshin.zafari at oracle.com> Committer: Robbin Ehn <rehn at openjdk.org> URL: https://git.openjdk.org/jdk/commit/dcf431db0b88c33e574b5986f22df5ed6e9b8be4 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod 8287400: Make BitMap range parameter names consistent Reviewed-by: dholmes, lkorinth ------------- PR: https://git.openjdk.org/jdk/pull/11375 From alanb at openjdk.org Wed Nov 30 16:44:42 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 16:44:42 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> Message-ID: <2wV4OEuvJQXQGnRNZ7qhv1PZuMlEYFBqnDgOp5L6D9U=.76a9f864-872a-4c39-a02e-2b0646414571@github.com> On Wed, 30 Nov 2022 15:30:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Address review comment > - Polish javadoc: > * Make sure that first para of class javadoc is succinct and descriptive > * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) > * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. > * Other minor typos fixes > - Address review comments > - * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues > - Fix bit vs. byte mismatch in test > - Fix wrong check in MemorySegment::spliterator/elements > (The check which ensures that the segment size is multiple of spliterator element size is bogus) > - Address more review comments > - Fix bad @throws in MemorySegment::copy methods > - Address review comments > - ... and 57 more: https://git.openjdk.org/jdk/compare/e1da2b11...8668fb39 src/java.base/share/classes/java/lang/foreign/SegmentScope.java line 1: > 1: package java.lang.foreign; This one is missing a header. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From kbarrett at openjdk.org Wed Nov 30 16:48:29 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 16:48:29 GMT Subject: RFR: 8297830: aarch64: Make Address a descriminated union internally In-Reply-To: <in_YbHtsyqlgu8Z4BR0D9U-cGYvuTvYDDCL_0WwP15s=.42fe7b60-592b-4f94-babc-249dd5761e0e@github.com> References: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> <in_YbHtsyqlgu8Z4BR0D9U-cGYvuTvYDDCL_0WwP15s=.42fe7b60-592b-4f94-babc-249dd5761e0e@github.com> Message-ID: <_t8eVN9t_wi4Ap0MbkHbVzdBn8F1obrLd1k8FARg888=.19819b97-e9dc-4f42-b926-5843a12a4056@github.com> On Wed, 30 Nov 2022 15:30:02 GMT, Andrew Haley <aph at openjdk.org> wrote: > OK, thanks. It does seem to me a bit like moving the furniture around, but I have to admit it's an improvement. :-) The driver for all these cleanups around aarch64 Address is that when I tried to fix JDK-8160404 I got (valid) compiler warnings here about UB. ------------- PR: https://git.openjdk.org/jdk/pull/11429 From dcubed at openjdk.org Wed Nov 30 16:48:34 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 30 Nov 2022 16:48:34 GMT Subject: RFR: 8297830: aarch64: Make Address a descriminated union internally In-Reply-To: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> References: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> Message-ID: <i-weM9XJNKyNrBJs3RVWojFcJGLsSSdjbNExIt6NQqI=.f9317f5c-5558-4b43-9f9b-dd13dc457816@github.com> On Wed, 30 Nov 2022 11:13:35 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change to the aarch64 Address class. It now uses an > internal union, separating the literal and nonliteral cases. > > This avoids leaving some fields uninitialized or initializing them to dummy > values. It also reduces the size of the Address class somewhat, though it's > unclear whether that makes any noticeable difference. > > Testing: > mach5 tier1 for linux-aarch64 and macosx-aarch64 > mach5 tier2-5 for linux-aarch64 s/descriminated/discriminated/ Use "/issue JDK-8297830" to update the PR title... or just do the single char fix. ------------- PR: https://git.openjdk.org/jdk/pull/11429 From alanb at openjdk.org Wed Nov 30 16:49:00 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 16:49:00 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> Message-ID: <1Ao-HvZlCHoGgLIJSJTXOOnvoR1pRtaZoljYUZpFEv0=.1a93bb09-06aa-4fad-905a-41f5f12b6945@github.com> On Wed, 30 Nov 2022 15:30:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Address review comment > - Polish javadoc: > * Make sure that first para of class javadoc is succinct and descriptive > * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) > * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. > * Other minor typos fixes > - Address review comments > - * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues > - Fix bit vs. byte mismatch in test > - Fix wrong check in MemorySegment::spliterator/elements > (The check which ensures that the segment size is multiple of spliterator element size is bogus) > - Address more review comments > - Fix bad @throws in MemorySegment::copy methods > - Address review comments > - ... and 57 more: https://git.openjdk.org/jdk/compare/3e822e72...8668fb39 src/java.base/share/classes/java/lang/ModuleLayer.java line 313: > 311: * where possible. > 312: * > 313: * @since 20 We usually put the "@since 20" after the params/return/throws. ------------- PR: https://git.openjdk.org/jdk/pull/10872 From kbarrett at openjdk.org Wed Nov 30 16:50:38 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 16:50:38 GMT Subject: RFR: 8293824: gc/whitebox/TestConcMarkCycleWB.java failed "RuntimeException: assertTrue: expected true, was false" Message-ID: <74AnEqLugaEU80XMNm8edtWBmK8gOFMirnZT1rdj1jI=.6ab039c0-3db5-4336-9aa7-867166b79cda@github.com> Please review this change to WhiteBox and some tests involving G1 concurrent GCs. Some tests currently use WhiteBox.g1StartConcMarkCycle() to trigger a concurrent GC. Many of them follow it with a loop waiting for a concurrent cycle to not be in progress. A few also preceed that call with a similar loop, since that call does nothing and returns false if a concurrent cycle is already in progress. Those tests typically want to ensure there was a concurrent cycle that was started after some setup. The failing test calls that function, asserting that it returned true, e.g. a new concurrent cycle was started. There are various problems with this, due to races with concurrent cycles started automatically and possibly aborted (by full GCs) concurrent cycles, making some of these tests unreliable in some configurations. For example, the test failure associated with this bug intermittently arises when running with `-Xcomp`, triggering a concurrent cycle before the explicit requests by the test, causing the explicit request to fail (because there is already one in progress), failing the assert. Waiting for there not to be an in-progress cycle before the explicit request just narrows the race window. We have a different mechanism for controlling concurrent cycles, the concurrent GC breakpoint mechanism. By adding a counter specifically for such cycles, we can use GC breakpoints to ensure only the concurrent cycles the test wants are occurring, and can verify they completed successfully. So we change tests using WhiteBox.g1StartConcMarkCycle() to instead use GC breakpoints, along with the new WhiteBox.g1CompletedConcurrentMarkCycles() to avoid racing request problems and to detect aborted cycles. Since it is no longer used, WhiteBox.g1StartConcMarkCycle() is removed. Testing: mach5 tier1-6 ------------- Commit messages: - copyrights - remove WB.g1StartConcMarkCyle - update tests - add utility GC breakpoint functions - add G1ConcurrentMark::completed_mark_cycles() and whitebox access Changes: https://git.openjdk.org/jdk/pull/11435/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11435&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293824 Stats: 268 lines in 21 files changed: 93 ins; 122 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/11435.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11435/head:pull/11435 PR: https://git.openjdk.org/jdk/pull/11435 From kbarrett at openjdk.org Wed Nov 30 16:56:16 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 16:56:16 GMT Subject: RFR: 8297852: Backout 8252584 for the time being In-Reply-To: <9rSgaMOb56fcYoRPJX6rgi1jcvzxQXsMqAod76PmF4s=.97ad88c3-1250-4aea-a1f9-1c78987a3ad6@github.com> References: <9rSgaMOb56fcYoRPJX6rgi1jcvzxQXsMqAod76PmF4s=.97ad88c3-1250-4aea-a1f9-1c78987a3ad6@github.com> Message-ID: <rZ7-BhUvgnIoDZf6QAeYy3cyGm81UZpdQY96V7dC0_s=.e9f7fcc9-dfb5-4c72-b03a-4b2834fc1ef0@github.com> On Wed, 30 Nov 2022 15:00:32 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Revert 8252584 temporarily (My mistake) I've asked @vnkozlov what he wants to do about the premature integration. ------------- PR: https://git.openjdk.org/jdk/pull/11433 From dcubed at openjdk.org Wed Nov 30 16:56:20 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 30 Nov 2022 16:56:20 GMT Subject: RFR: 8297830: aarch64: Make Address a discriminated union internally In-Reply-To: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> References: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> Message-ID: <H1qy-Pq6Cuwb9WlP7xhaGA3pvyqKTQJtgYuO2OYc-Dk=.0a8eab44-f17d-497b-9c8b-2243e64a1841@github.com> On Wed, 30 Nov 2022 11:13:35 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > Please review this change to the aarch64 Address class. It now uses an > internal union, separating the literal and nonliteral cases. > > This avoids leaving some fields uninitialized or initializing them to dummy > values. It also reduces the size of the Address class somewhat, though it's > unclear whether that makes any noticeable difference. > > Testing: > mach5 tier1 for linux-aarch64 and macosx-aarch64 > mach5 tier2-5 for linux-aarch64 Thumbs up. A lot of small touches all over assembler_aarch64.hpp but the result is definitely cleaner... ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/11429 From alanb at openjdk.org Wed Nov 30 16:56:56 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 16:56:56 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> Message-ID: <ba5_3zIynQPMBvz4RzI35uYxeHxfnH9fJYorRs6trvc=.f2fda441-74aa-4da7-bc7d-03e20a580b48@github.com> On Wed, 30 Nov 2022 15:30:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Address review comment > - Polish javadoc: > * Make sure that first para of class javadoc is succinct and descriptive > * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) > * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. > * Other minor typos fixes > - Address review comments > - * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues > - Fix bit vs. byte mismatch in test > - Fix wrong check in MemorySegment::spliterator/elements > (The check which ensures that the segment size is multiple of spliterator element size is bogus) > - Address more review comments > - Fix bad @throws in MemorySegment::copy methods > - Address review comments > - ... and 57 more: https://git.openjdk.org/jdk/compare/9a07ecac...8668fb39 src/java.base/share/classes/java/lang/foreign/SegmentScope.java line 69: > 67: * Creates a new scope that is managed, automatically, by the garbage collector. > 68: * Segments associated with the returned scope can be > 69: * {@linkplain SegmentScope#isAccessibleBy(Thread) accessed} by multiple threads. "can be accessed by multiple threads" hints a bit of concurrency. It might be clearer to say "by any thread". ------------- PR: https://git.openjdk.org/jdk/pull/10872 From alanb at openjdk.org Wed Nov 30 17:08:13 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 17:08:13 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v35] In-Reply-To: <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <qmTRPgpsuh3MB6RWmVW5G5Y46JWoevBnYC2ljqud5eY=.9571f7b3-113b-420d-90ec-f2f4207d3cc0@github.com> Message-ID: <9lmq8wD3c1YD4s0NGrHQXV6CgVJyJ9S42xUmx1FzXJ0=.2f64ed69-a54d-4517-ae29-1539f06cffe0@github.com> On Wed, 30 Nov 2022 15:30:40 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 67 additional commits since the last revision: > > - Merge branch 'master' into PR_20 > - Address review comment > - Polish javadoc: > * Make sure that first para of class javadoc is succinct and descriptive > * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) > * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. > * Other minor typos fixes > - Address review comments > - * remove unused Scoped interface > * re-add trusting of final fields in layout class implementations > * Fix BulkOps benchmark, which had alignment issues > - Fix bit vs. byte mismatch in test > - Fix wrong check in MemorySegment::spliterator/elements > (The check which ensures that the segment size is multiple of spliterator element size is bogus) > - Address more review comments > - Fix bad @throws in MemorySegment::copy methods > - Address review comments > - ... and 57 more: https://git.openjdk.org/jdk/compare/ddc274f3...8668fb39 src/java.base/share/classes/java/lang/foreign/Arena.java line 135: > 133: * @apiNote This operation is not idempotent; that is, closing an already closed arena <em>always</em> results in an > 134: * exception being thrown. This reflects a deliberate design choice: arena state transitions should be > 135: * manifest in the client code; a failure in any of these transitions reveals a bug in the underlying application Not important but I'm not sure about the wording here. Maybe you mean "manifested" or "should manifest" ? src/java.base/share/classes/java/lang/foreign/Arena.java line 155: > 153: > 154: /** > 155: * {@return a new confined arena} For completeness, this should probably say "a new confined arena owned by the current thread". ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Wed Nov 30 18:14:00 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 18:14:00 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v36] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <1oR6S6K1w-GPz7Mw67Sqw9s8mPI4YDyC9_FOOjIqJU4=.9e645539-2bba-4740-be1e-e61493a3252f@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/8668fb39..df8a4a63 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=34-35 Stats: 34 lines in 3 files changed: 29 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From aph at openjdk.org Wed Nov 30 18:19:45 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 18:19:45 GMT Subject: RFR: 8297830: aarch64: Make Address a discriminated union internally In-Reply-To: <_t8eVN9t_wi4Ap0MbkHbVzdBn8F1obrLd1k8FARg888=.19819b97-e9dc-4f42-b926-5843a12a4056@github.com> References: <aP7eQIy57CcOuDDWStdWmefZIfIs_JYRV2Mpqwt5bAY=.34015ef8-fb4c-4276-a003-728dc02530ca@github.com> <in_YbHtsyqlgu8Z4BR0D9U-cGYvuTvYDDCL_0WwP15s=.42fe7b60-592b-4f94-babc-249dd5761e0e@github.com> <_t8eVN9t_wi4Ap0MbkHbVzdBn8F1obrLd1k8FARg888=.19819b97-e9dc-4f42-b926-5843a12a4056@github.com> Message-ID: <5y76cUcUmXU65CqHVy12wRWidDj6QAifCRGryIx1-Hw=.e50e42f6-b765-4986-8754-5a2b91175505@github.com> On Wed, 30 Nov 2022 16:45:35 GMT, Kim Barrett <kbarrett at openjdk.org> wrote: > The driver for all these cleanups around aarch64 Address is that when I tried to fix JDK-8160404 I got (valid) compiler warnings here about UB. Aha! Thanks. It's good to have that here, in the discussion. ------------- PR: https://git.openjdk.org/jdk/pull/11429 From aph at openjdk.org Wed Nov 30 18:34:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 18:34:56 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> Message-ID: <N0m99mMUujSXIj2etNVVDuLJ4Fgkr44Go4MGydKykSI=.6ac42829-1383-405e-9be9-12818f24be87@github.com> > This patch fixes the remaining null pointer dereference bugs that I know of. > > For the main bug, C2 was using a null reference to indicate an uninitialized `Node_List`. I replaced the null reference with a static sentinel. > > I also turned on `-fsanitize=null` and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with `-fsanitize=null` enabled. > > I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. > > I'd like to enable `-fsanitize=null` in debug builds to prevent regressions in this area. What do you think? Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Migrate postaloc.cpp migrated away from references to pointers when it comes to Node_List. Co-authored-by: Vladimir Ivanov <vaivanov at openjdk.org> - Merge from JDK head - Revert "Push ScopedValue tests" This reverts commit d298edfa9eda48ace9a27f83d38320fe6ba79e67. - Push ScopedValue tests - More - Next - Next - Next - Next ------------- Changes: https://git.openjdk.org/jdk/pull/10920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=03 Stats: 62 lines in 7 files changed: 24 ins; 1 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/10920.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10920/head:pull/10920 PR: https://git.openjdk.org/jdk/pull/10920 From aph at openjdk.org Wed Nov 30 18:38:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 18:38:44 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> Message-ID: <R33R9eC4027L3_j1QefeOLUvLhvcN96F5Df6keS-KnI=.bf47b789-8e99-48a9-b41e-c05a30820213@github.com> On Wed, 23 Nov 2022 10:54:27 GMT, Andrew Haley <aph at openjdk.org> wrote: > I am fine with enabling it for debug VM. But can you give at least some numbers? I'll do that separately. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From aph at openjdk.org Wed Nov 30 18:38:46 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Nov 2022 18:38:46 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <WBzNXPj7q8bjEEbF7X8GnyMKZ0h6SV1mEnnoj9il3EA=.8183b8cb-c2d9-4e57-b5bc-64fb1d959d14@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> <GXcLOnn_At_5Cej39psYTY58lWVL8HeQHhr8hUN7a5E=.0125c2af-6ea8-4b4b-bf04-ae5981b81c02@github.com> <WBzNXPj7q8bjEEbF7X8GnyMKZ0h6SV1mEnnoj9il3EA=.8183b8cb-c2d9-4e57-b5bc-64fb1d959d14@github.com> Message-ID: <zMVTB_IIo8yTZYZ7ymF9HdHB2DC2c2h0RvqdjPdvCrI=.104cfdf1-4cb1-47f7-8c18-aeb141dcbcdc@github.com> On Thu, 3 Nov 2022 23:57:01 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Oh, I see it now... >> >> Considering `Node_List::_empty_list` is effectively unusable (except for `is_null()` query), I'd prefer to see `postaloc.cpp` migrated away from references to pointers when it comes to `Node_List`. It already does ugly things like [1] which your patch doesn't handle yet. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/postaloc.cpp#L260 >> >> How about the following patch for `postaloc.cpp`? Does it solve your problem? >> >> diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp >> index 96c30a122bb..10c9d1f90ae 100644 >> --- a/src/hotspot/share/opto/postaloc.cpp >> +++ b/src/hotspot/share/opto/postaloc.cpp >> @@ -77,7 +77,7 @@ bool PhaseChaitin::may_be_copy_of_callee( Node *def ) const { >> >> //------------------------------yank----------------------------------- >> // Helper function for yank_if_dead >> -int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_List *regnd ) { >> +int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_List *regnd) { >> int blk_adjust=0; >> Block *oldb = _cfg.get_block_for_node(old); >> oldb->find_remove(old); >> @@ -87,9 +87,9 @@ int PhaseChaitin::yank( Node *old, Block *current_block, Node_List *value, Node_ >> } >> _cfg.unmap_node_from_block(old); >> OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); >> - if( regnd && (*regnd)[old_reg]==old ) { // Instruction is currently available? >> - value->map(old_reg,NULL); // Yank from value/regnd maps >> - regnd->map(old_reg,NULL); // This register's value is now unknown >> + if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? >> + value->map(old_reg, NULL); // Yank from value/regnd maps >> + regnd->map(old_reg, NULL); // This register's value is now unknown >> } >> return blk_adjust; >> } >> @@ -161,7 +161,7 @@ int PhaseChaitin::yank_if_dead_recurse(Node *old, Node *orig_old, Block *current >> // Use the prior value instead of the current value, in an effort to make >> // the current value go dead. Return block iterator adjustment, in case >> // we yank some instructions from this block. >> -int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List &value, Node_List ®nd ) { >> +int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *current_block, Node_List *value, Node_List *regnd ) { >> // No effect? >> if( def == n->in(idx) ) return 0; >> // Def is currently dead and can be removed? Do not resurrect >> @@ -207,7 +207,7 @@ int PhaseChaitin::use_prior_register( Node *n, uint idx, Node *def, Block *curre >> _post_alloc++; >> >> // Is old def now dead? We successfully yanked a copy? >> - return yank_if_dead(old,current_block,&value,®nd); >> + return yank_if_dead(old,current_block,value,regnd); >> } >> >> >> @@ -229,7 +229,7 @@ Node *PhaseChaitin::skip_copies( Node *c ) { >> >> //------------------------------elide_copy------------------------------------- >> // Remove (bypass) copies along Node n, edge k. >> -int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &value, Node_List ®nd, bool can_change_regs ) { >> +int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *value, Node_List *regnd, bool can_change_regs ) { >> int blk_adjust = 0; >> >> uint nk_idx = _lrg_map.live_range_id(n->in(k)); >> @@ -253,12 +253,13 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v >> >> // Phis and 2-address instructions cannot change registers so easily - their >> // outputs must match their input. >> - if( !can_change_regs ) >> + if (!can_change_regs) { >> return blk_adjust; // Only check stupid copies! >> - >> + } >> // Loop backedges won't have a value-mapping yet >> - if( &value == NULL ) return blk_adjust; >> - >> + if (value == NULL) { >> + return blk_adjust; >> + } >> // Skip through all copies to the _value_ being used. Do not change from >> // int to pointer. This attempts to jump through a chain of copies, where >> // intermediate copies might be illegal, i.e., value is stored down to stack >> @@ -273,10 +274,11 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v >> // See if it happens to already be in the correct register! >> // (either Phi's direct register, or the common case of the name >> // never-clobbered original-def register) >> - if (register_contains_value(val, val_reg, n_regs, value)) { >> - blk_adjust += use_prior_register(n,k,regnd[val_reg],current_block,value,regnd); >> - if( n->in(k) == regnd[val_reg] ) // Success! Quit trying >> - return blk_adjust; >> + if (register_contains_value(val, val_reg, n_regs, *value)) { >> + blk_adjust += use_prior_register(n,k,regnd->at(val_reg),current_block,value,regnd); >> + if (n->in(k) == regnd->at(val_reg)) { >> + return blk_adjust; // Success! Quit trying >> + } >> } >> >> // See if we can skip the copy by changing registers. Don't change from >> @@ -304,7 +306,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v >> if (ignore_self) continue; >> } >> >> - Node *vv = value[reg]; >> + Node *vv = value->at(reg); >> // For scalable register, number of registers may be inconsistent between >> // "val_reg" and "reg". For example, when "val" resides in register >> // but "reg" is located in stack. >> @@ -325,7 +327,7 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v >> last = (n_regs-1); // Looking for the last part of a set >> } >> if ((reg&last) != last) continue; // Wrong part of a set >> - if (!register_contains_value(vv, reg, n_regs, value)) continue; // Different value >> + if (!register_contains_value(vv, reg, n_regs, *value)) continue; // Different value >> } >> if( vv == val || // Got a direct hit? >> (t && vv && vv->bottom_type() == t && vv->is_Mach() && >> @@ -333,9 +335,9 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List &v >> assert( !n->is_Phi(), "cannot change registers at a Phi so easily" ); >> if( OptoReg::is_stack(nk_reg) || // CISC-loading from stack OR >> OptoReg::is_reg(reg) || // turning into a register use OR >> - regnd[reg]->outcnt()==1 ) { // last use of a spill-load turns into a CISC use >> - blk_adjust += use_prior_register(n,k,regnd[reg],current_block,value,regnd); >> - if( n->in(k) == regnd[reg] ) // Success! Quit trying >> + regnd->at(reg)->outcnt()==1 ) { // last use of a spill-load turns into a CISC use >> + blk_adjust += use_prior_register(n,k,regnd->at(reg),current_block,value,regnd); >> + if( n->in(k) == regnd->at(reg) ) // Success! Quit trying >> return blk_adjust; >> } // End of if not degrading to a stack >> } // End of if found value in another register >> @@ -535,7 +537,7 @@ void PhaseChaitin::post_allocate_copy_removal() { >> Block* pb = _cfg.get_block_for_node(block->pred(j)); >> // Remove copies along phi edges >> for (uint k = 1; k < phi_dex; k++) { >> - elide_copy(block->get_node(k), j, block, *blk2value[pb->_pre_order], *blk2regnd[pb->_pre_order], false); >> + elide_copy(block->get_node(k), j, block, blk2value[pb->_pre_order], blk2regnd[pb->_pre_order], false); >> } >> if (blk2value[pb->_pre_order]) { // Have a mapping on this edge? >> // See if this predecessor's mappings have been used by everybody >> @@ -691,7 +693,7 @@ void PhaseChaitin::post_allocate_copy_removal() { >> >> // Remove copies along input edges >> for (k = 1; k < n->req(); k++) { >> - j -= elide_copy(n, k, block, value, regnd, two_adr != k); >> + j -= elide_copy(n, k, block, &value, ®nd, two_adr != k); >> } >> >> // Unallocated Nodes define no registers > > Sorry, missed a couple of null checks. The following patch on top of the previous one passes hs-tier1/2: > > diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp > index 10c9d1f90ae..b39a78eef48 100644 > --- a/src/hotspot/share/opto/postaloc.cpp > +++ b/src/hotspot/share/opto/postaloc.cpp > @@ -87,7 +87,8 @@ int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_L > } > _cfg.unmap_node_from_block(old); > OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); > - if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? > + assert(value != NULL || regnd == NULL, "sanity"); > + if (value != NULL && regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? > value->map(old_reg, NULL); // Yank from value/regnd maps > regnd->map(old_reg, NULL); // This register's value is now unknown > } > @@ -257,7 +258,8 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *v > return blk_adjust; // Only check stupid copies! > } > // Loop backedges won't have a value-mapping yet > - if (value == NULL) { > + assert(regnd != NULL || value == NULL, "sanity"); > + if (value == NULL || regnd == NULL) { > return blk_adjust; > } > // Skip through all copies to the _value_ being used. Do not change from Done. If you're happy with this I'll push after tests. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10920 From amenkov at openjdk.org Wed Nov 30 18:42:55 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 30 Nov 2022 18:42:55 GMT Subject: Integrated: 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests In-Reply-To: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> References: <T5WXmc3XPCTuQUOhDF7pAj7BwZq9XVcTHmkogNXFV5E=.b930bfe2-09ff-4ae7-8ba6-dbfca773bf4c@github.com> Message-ID: <GAXa0rgOY6YeYjWUQAacErdiy3iALMhLZXq0GU7EtIo=.12e6b2a9-1cc6-4bd9-873e-0531935e2ab1@github.com> On Mon, 28 Nov 2022 23:28:00 GMT, Alex Menkov <amenkov at openjdk.org> wrote: > The fix combines almost the same tests to 1 test to remove code duplication This pull request has now been integrated. Changeset: 53dd2143 Author: Alex Menkov <amenkov at openjdk.org> URL: https://git.openjdk.org/jdk/commit/53dd214318c7367ceccc511f1a5220797c5e253f Stats: 309 lines in 7 files changed: 61 ins; 245 del; 3 mod 8297742: Combine vmTestbase/nsk/monitoring/ThreadMXBean/resetPeakThreadCount tests Reviewed-by: dholmes, lmesnik, kevinw, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/11400 From kbarrett at openjdk.org Wed Nov 30 19:16:34 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Nov 2022 19:16:34 GMT Subject: RFR: 8297852: Backout 8252584 for the time being In-Reply-To: <9rSgaMOb56fcYoRPJX6rgi1jcvzxQXsMqAod76PmF4s=.97ad88c3-1250-4aea-a1f9-1c78987a3ad6@github.com> References: <9rSgaMOb56fcYoRPJX6rgi1jcvzxQXsMqAod76PmF4s=.97ad88c3-1250-4aea-a1f9-1c78987a3ad6@github.com> Message-ID: <WuYbW9GW9RE5J-PuZ-KdbrhzJJ4yZ4hRgtncCafio0s=.0ecbd0be-c367-4bcf-88ee-64461becbcff@github.com> On Wed, 30 Nov 2022 15:00:32 GMT, Julian Waters <jwaters at openjdk.org> wrote: > Revert 8252584 temporarily (My mistake) Looks good. @vnkozlov says backout and do a new one (will need a new JBS issue). ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11433 From kvn at openjdk.org Wed Nov 30 19:35:35 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Nov 2022 19:35:35 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <R33R9eC4027L3_j1QefeOLUvLhvcN96F5Df6keS-KnI=.bf47b789-8e99-48a9-b41e-c05a30820213@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <H7qFAMewkLJ4IWrVipLemv1iBgI5qrWOM1pfJ0p6hGk=.315fce9d-2c7d-42b8-a569-c74d8c7097f2@github.com> <WK7Sg9jDAwczPdU4Hax_iFJHBpipKrTBwAyXuX7IdlQ=.32813ab5-8c25-4dfe-9bc9-90d17c462af8@github.com> <syTpW1xc6IoV30N1_PLphGvd9jePaErgIFJ_bhCJoqU=.8ca9e2f2-1415-4514-9677-e319d89b05c0@github.com> <0NmyytQNPzQ7YJowhCYd4K-nm-ahIftkvOi0o-e5PGE=.81c3d51e-c54c-4e90-8b4b-6d597bfb75cf@github.com> <R33R9eC4027L3_j1QefeOLUvLhvcN96F5Df6keS-KnI=.bf47b789-8e99-48a9-b41e-c05a30820213@github.com> Message-ID: <IJbdl2owWEq4f4tVQN1ZVL0yi_n-EPTnS9WKt5ZLxi8=.072a2da2-704c-45af-b531-66d26f62d130@github.com> On Wed, 30 Nov 2022 18:36:29 GMT, Andrew Haley <aph at openjdk.org> wrote: > > I am fine with enabling it for debug VM. But can you give at least some numbers? > > I'll do that separately. Agree. I will run our testing with your latest changes. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From vlivanov at openjdk.org Wed Nov 30 20:04:42 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 30 Nov 2022 20:04:42 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> Message-ID: <8p5DnhjjOlFQ6Tu_DEfJsvW7ac-MgQ8vP3_LSvocgGg=.16e04fa6-d7cc-4cc9-ade6-0fd8f83b4df4@github.com> On Tue, 1 Nov 2022 23:55:48 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote: >> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Migrate postaloc.cpp migrated away from references to pointers when it comes to Node_List. >> >> Co-authored-by: Vladimir Ivanov <vaivanov at openjdk.org> >> - Merge from JDK head >> - Revert "Push ScopedValue tests" >> >> This reverts commit d298edfa9eda48ace9a27f83d38320fe6ba79e67. >> - Push ScopedValue tests >> - More >> - Next >> - Next >> - Next >> - Next > > src/hotspot/share/opto/bytecodeInfo.cpp line 66: > >> 64: assert(!caller_jvms->should_reexecute(), "there should be no reexecute bytecode with inlining"); >> 65: } >> 66: assert(_caller_jvms == NULL > > I'd reshape the code and either get rid of `_caller_jvms` initialization on line 47 or replace it with `_caller_jvms(NULL),`. > > Then, I'd guard `_caller_jvms` initialization by `caller_jvms != NULL` and move the assert under the guard: > > if (caller_jvms != NULL) { > // Keep a private copy of the caller_jvms: > _caller_jvms = new (C) JVMState(caller_jvms->method(), caller_tree->caller_jvms()); > _caller_jvms->set_bci(caller_jvms->bci()); > assert(!caller_jvms->should_reexecute(), "there should be no reexecute bytecode with inlining"); > assert(caller_jvms->same_calls_as(_caller_jvms), "consistent JVMS"); > } > > > Or introduce a helper method which does a shallow copy of `caller_jvms` as part of initializing store on line 47. Please, initialize `_caller_jvms` to `NULL` (on line 47), switch the null check to `caller_jvms` (on line 60), and move the assert under the null check. ------------- PR: https://git.openjdk.org/jdk/pull/10920 From vlivanov at openjdk.org Wed Nov 30 20:04:44 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 30 Nov 2022 20:04:44 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references [v4] In-Reply-To: <zMVTB_IIo8yTZYZ7ymF9HdHB2DC2c2h0RvqdjPdvCrI=.104cfdf1-4cb1-47f7-8c18-aeb141dcbcdc@github.com> References: <mF8Lu-7gPvjgcY-Oy30aoKaJgBQwohBSMbs6BQovz24=.4b995332-90a9-440f-899b-07f7cbb3054c@github.com> <6lfTT_Djcw8AsOUW09Kh5BSqxFhveF5ASItS8O_E3C8=.25a1cf8e-73b3-40f0-a531-1cc34264807a@github.com> <rNi-kFIlucDsCkWt1AGcDDz7wCDR-9USkGquiNM9_2I=.ed6b5beb-bd3c-478b-b2d4-622d476e8251@github.com> <GXcLOnn_At_5Cej39psYTY58lWVL8HeQHhr8hUN7a5E=.0125c2af-6ea8-4b4b-bf04-ae5981b81c02@github.com> <WBzNXPj7q8bjEEbF7X8GnyMKZ0h6SV1mEnnoj9il3EA=.8183b8cb-c2d9-4e57-b5bc-64fb1d959d14@github.com> <zMVTB_IIo8yTZYZ7ymF9HdHB2DC2c2h0RvqdjPdvCrI=.104cfdf1-4cb1-47f7-8c18-aeb141dcbcdc@github.com> Message-ID: <pDjCV3HUo8yf-xjKiGKQNFiPT27vB9a2qrNnxMsnTNY=.ab22ee9a-2486-4705-964b-196d577c0cac@github.com> On Wed, 30 Nov 2022 18:35:37 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Sorry, missed a couple of null checks. The following patch on top of the previous one passes hs-tier1/2: >> >> diff --git a/src/hotspot/share/opto/postaloc.cpp b/src/hotspot/share/opto/postaloc.cpp >> index 10c9d1f90ae..b39a78eef48 100644 >> --- a/src/hotspot/share/opto/postaloc.cpp >> +++ b/src/hotspot/share/opto/postaloc.cpp >> @@ -87,7 +87,8 @@ int PhaseChaitin::yank(Node *old, Block *current_block, Node_List *value, Node_L >> } >> _cfg.unmap_node_from_block(old); >> OptoReg::Name old_reg = lrgs(_lrg_map.live_range_id(old)).reg(); >> - if (regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? >> + assert(value != NULL || regnd == NULL, "sanity"); >> + if (value != NULL && regnd != NULL && regnd->at(old_reg) == old) { // Instruction is currently available? >> value->map(old_reg, NULL); // Yank from value/regnd maps >> regnd->map(old_reg, NULL); // This register's value is now unknown >> } >> @@ -257,7 +258,8 @@ int PhaseChaitin::elide_copy( Node *n, int k, Block *current_block, Node_List *v >> return blk_adjust; // Only check stupid copies! >> } >> // Loop backedges won't have a value-mapping yet >> - if (value == NULL) { >> + assert(regnd != NULL || value == NULL, "sanity"); >> + if (value == NULL || regnd == NULL) { >> return blk_adjust; >> } >> // Skip through all copies to the _value_ being used. Do not change from > > Done. If you're happy with this I'll push after tests. Thanks! Thanks. Is the null check in `Node_Array` constructor still needed? ------------- PR: https://git.openjdk.org/jdk/pull/10920 From alanb at openjdk.org Wed Nov 30 20:35:29 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Nov 2022 20:35:29 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v36] In-Reply-To: <1oR6S6K1w-GPz7Mw67Sqw9s8mPI4YDyC9_FOOjIqJU4=.9e645539-2bba-4740-be1e-e61493a3252f@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> <1oR6S6K1w-GPz7Mw67Sqw9s8mPI4YDyC9_FOOjIqJU4=.9e645539-2bba-4740-be1e-e61493a3252f@github.com> Message-ID: <B07OEJseK_eBEE-1qP-FJ3DnhV6Hxl3AfGQN4MH3Lwc=.17eee93d-41ec-4b9c-85af-74d2cfb3a5b6@github.com> On Wed, 30 Nov 2022 18:14:00 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10872 From avoitylov at openjdk.org Wed Nov 30 20:36:18 2022 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Wed, 30 Nov 2022 20:36:18 GMT Subject: RFR: 8291302: ARM32: nmethod entry barriers support Message-ID: <BJaszr0_haU4UZOyQQEA9P0SOlgtfTtGFBj9VSRD5SM=.ffd6af56-a51e-4627-90cb-0eb0e1bbd5e8@github.com> This PR implements nmethod entry barriers for ARM32. It has already been implemented for other ports and is related with JDK-8290025 "Remove the Sweeper". ------------- Commit messages: - 8291302: ARM32: nmethod entry barriers support Changes: https://git.openjdk.org/jdk/pull/11442/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11442&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291302 Stats: 250 lines in 11 files changed: 245 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/11442.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11442/head:pull/11442 PR: https://git.openjdk.org/jdk/pull/11442 From avoitylov at openjdk.org Wed Nov 30 20:45:36 2022 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Wed, 30 Nov 2022 20:45:36 GMT Subject: RFR: 8291302: ARM32: nmethod entry barriers support In-Reply-To: <BJaszr0_haU4UZOyQQEA9P0SOlgtfTtGFBj9VSRD5SM=.ffd6af56-a51e-4627-90cb-0eb0e1bbd5e8@github.com> References: <BJaszr0_haU4UZOyQQEA9P0SOlgtfTtGFBj9VSRD5SM=.ffd6af56-a51e-4627-90cb-0eb0e1bbd5e8@github.com> Message-ID: <IHVXjr0EnPdWPm7t6JI3pZR3l6phvm6kMmjoMqAmy-4=.a5c7ee78-33bb-41b5-9360-bedda8848cd8@github.com> On Wed, 30 Nov 2022 20:26:23 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote: > This PR implements nmethod entry barriers for ARM32. It has already been implemented for other ports and is related with JDK-8290025 "Remove the Sweeper". Errors in pre-submit tests are clearly unrelated, GHA can't download the bundles for linux-x86: Error: Unable to find an artifact with the name: bundles-linux-x86 ------------- PR: https://git.openjdk.org/jdk/pull/11442 From ckozak at ckozak.net Wed Nov 30 20:43:39 2022 From: ckozak at ckozak.net (Carter Kozak) Date: Wed, 30 Nov 2022 15:43:39 -0500 Subject: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT) In-Reply-To: <4f7deee9-366d-05a2-9268-09a25a138d8d@oracle.com> References: <CAA-vtUzvyqb_LZ9d1Z80oSvzyqpi89053O3B+oDZF4oZ85CuZg@mail.gmail.com> <c5db2105-3b78-0b77-779f-011c8649f476@oracle.com> <CAA-vtUxrP49w523EQ3yKFh4Zo5W+HMtqpGcsUxOEMkwZUOQ0zQ@mail.gmail.com> <a20d3253-47b6-41dc-b8e4-0b894137ed90@app.fastmail.com> <4f7deee9-366d-05a2-9268-09a25a138d8d@oracle.com> Message-ID: <cb8b2707-4f43-4e91-83ac-9d3b1f6df2bb@app.fastmail.com> This looks fantastic, thank you so much! I can confirm that the proposed design would solve my use-case. I'd enjoy discussing the NMT event contract somewhere more specific to the implementation, but I don't want to muddle this thread with implementation details. Carter Kozak On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote: > Hi Carter, > > Your mail made me pick up an old item from my wishlist: to have native > memory tracking information available in JFR recordings. When we, in GC, > do improvements to decrease the native memory overhead of our > algorithms, NMT is a very good tool to track the progress. We have > scripts that sound very similar to what you describe and more than once > I've been thinking about adding this information into JFR. But it has > not been a priority and the greater value has been unclear. > > Hearing that others might also benefit from such a change I took a > discussion with the JFR team on how to best proceed with this. I have > created a branch for this and will probably create a PR for it shortly, > but I thought I would drop it here first: > https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt > > The change adds two new JFR events: one for the total usage and one for > the usage of each memory type. These are sent only if Native Memory > Tracking is turned on, and they are enabled in the default JFR profile > with an interval of 1s. This might change during reviewing but it was a > good starting point. > > With this you will be able to use JFR streaming to access the events > from within your running process. I hope this will help your use cases > and please let us know if you have any comments or suggestions. > > Thanks, > Stefan -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20221130/ef0fd315/attachment.htm> From eosterlund at openjdk.org Wed Nov 30 21:03:18 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 30 Nov 2022 21:03:18 GMT Subject: RFR: 8291302: ARM32: nmethod entry barriers support In-Reply-To: <BJaszr0_haU4UZOyQQEA9P0SOlgtfTtGFBj9VSRD5SM=.ffd6af56-a51e-4627-90cb-0eb0e1bbd5e8@github.com> References: <BJaszr0_haU4UZOyQQEA9P0SOlgtfTtGFBj9VSRD5SM=.ffd6af56-a51e-4627-90cb-0eb0e1bbd5e8@github.com> Message-ID: <DtHvGM7UYziT5t9mqAihHlOR7WqwoE7RMvHY0Pt3PJU=.47aaab8a-9c1a-4158-9925-3f75df9d14b2@github.com> On Wed, 30 Nov 2022 20:26:23 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote: > This PR implements nmethod entry barriers for ARM32. It has already been implemented for other ports and is related with JDK-8290025 "Remove the Sweeper". When you call the nmethod entry barrier slow path, floating point arguments are live, but can get clobbered by the C++ runtime. You need to save them. That's probably why you can't print in the slow path without crashing. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/11442 From dlong at openjdk.org Wed Nov 30 21:39:20 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 21:39:20 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> Message-ID: <_Gn9ItyEwKzyhunfxC5b6wz7BZzPg2qD_e-wvhEqR-Q=.5be37213-7be9-40c7-8503-392bacc1cfa9@github.com> On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Unused variable src/hotspot/cpu/aarch64/aarch64.ad line 3635: > 3633: } > 3634: } else if (_method->intrinsic_id() == vmIntrinsicID::_ensureMaterializedForStackWalk) { > 3635: __ nop(); Please add a comment explaining why the nop is needed or desirable here. src/hotspot/cpu/x86/x86_64.ad line 2174: > 2172: RELOC_DISP32); > 2173: } else if (_method->intrinsic_id() == vmIntrinsicID::_ensureMaterializedForStackWalk) { > 2174: __ addr_nop_5(); Needs a comment. I guess this is because of how call sizes are computed. ------------- PR: https://git.openjdk.org/jdk/pull/10952 From dlong at openjdk.org Wed Nov 30 21:43:13 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 21:43:13 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> Message-ID: <00vRbThlo7WIZeqbqMNYNYQKiBH2aFP0O5d-swCOvtY=.f02f4b5a-fb86-4861-affd-000438b83b47@github.com> On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Unused variable src/hotspot/share/classfile/javaClasses.cpp line 1731: > 1729: } > 1730: > 1731: void java_lang_Thread::clear_scopedValueBindings(oop java_thread) { It looks like there is only one caller of this method that takes an oop. Would it make sense to have the caller check for the null oop, and assert that the oop is not null here? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From dlong at openjdk.org Wed Nov 30 21:51:25 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 21:51:25 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> Message-ID: <M18a3ZLOyyHRpwfez7cHkBr6ZNLUQq7VGfBm_VZhOYU=.b2fbaa34-d51a-4d69-a06f-3df096c2b65f@github.com> On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Unused variable src/hotspot/share/prims/jvm.cpp line 1385: > 1383: vframeStream vfst(thread); > 1384: for(; !vfst.at_end(); vfst.next()) { > 1385: int loc = 0; Use -1 instead (see below)? src/hotspot/share/prims/jvm.cpp line 1400: > 1398: } > 1399: > 1400: if (loc != 0) { As 0 is normally a valid local number, how about using -1 to mean "not found"? ------------- PR: https://git.openjdk.org/jdk/pull/10952 From dlong at openjdk.org Wed Nov 30 21:56:43 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 21:56:43 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> Message-ID: <7w-QaPG38IyBYlx7Y92glF_u0ze0N5hP_TWS1HSFuVo=.d22eefc4-b13c-44f1-a031-d5d555bec396@github.com> On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley <aph at openjdk.org> wrote: >> JEP 429 implementation. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Unused variable I made a few minor suggestions, but overall the HotSpot changes look good. Nice job Andrew. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10952 From mcimadamore at openjdk.org Wed Nov 30 21:56:51 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 21:56:51 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v37] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <60l3vr69yRpaCUeht5gNEVsf7ODpvdqFHpHdjxnfkAo=.a9b6e1de-4f38-4cf2-a840-e5cb249c522c@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/df8a4a63..198f30c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=35-36 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Wed Nov 30 22:05:59 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 30 Nov 2022 22:05:59 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v38] In-Reply-To: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> References: <x7ygjJqN-6Fpr3TqWadsqGflzDvYhi2QscqJJ1VM7KY=.f815a2ae-1bba-4c02-8faf-e5132fa974b5@github.com> Message-ID: <LmkFmp9yW9kSwMNUvERYQo7aI3C_PSeXoU0I7Yq-Pu4=.1e126451-43e7-43f0-938d-401ce907b7cc@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: - Merge branch 'master' into PR_20 - Address review comments - Address review comments - Merge branch 'master' into PR_20 - Address review comment - Polish javadoc: * Make sure that first para of class javadoc is succinct and descriptive * Remove references to "access" var handle or "memory segment view" var handle (just use var handle) * Minor tweak to layout classes javadoc - use `@see` in value layouts instead of a dedicated para. * Other minor typos fixes - Address review comments - * remove unused Scoped interface * re-add trusting of final fields in layout class implementations * Fix BulkOps benchmark, which had alignment issues - Fix bit vs. byte mismatch in test - Fix wrong check in MemorySegment::spliterator/elements (The check which ensures that the segment size is multiple of spliterator element size is bogus) - ... and 60 more: https://git.openjdk.org/jdk/compare/4485d4e5...8b5dc0f0 ------------- Changes: https://git.openjdk.org/jdk/pull/10872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=37 Stats: 13807 lines in 254 files changed: 5780 ins; 4448 del; 3579 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From dlong at openjdk.org Wed Nov 30 22:07:13 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Nov 2022 22:07:13 GMT Subject: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33] In-Reply-To: <nvUOFle6Lyz7hONaLZhECGu-MKhbk3Zngvy85-tQw28=.15ec4a6d-f5f7-435b-a0d7-667142dd1056@github.com> References: <GzS4FCR3pe2UG9degJJV3dP2RwUG8X8BTVpYobXldIs=.4c6dedaa-98d4-4288-8f55-bfd3c6b88c79@github.com> <lrGP8yOwAQ9WoHM5MesCdosl1FVpO6J_kLNVBY7W--M=.a1398409-df9e-4ff8-a061-f85bb6c6cbcc@github.com> <nvUOFle6Lyz7hONaLZhECGu-MKhbk3Zngvy85-tQw28=.15ec4a6d-f5f7-435b-a0d7-667142dd1056@github.com> Message-ID: <AevJHhXSSTFzaWOq3BKGGFJkDbeFeEoFhIW4-TEMctA=.4a078d6c-edae-460c-90a6-a9c39472c3f3@github.com> On Tue, 29 Nov 2022 11:49:10 GMT, Andrew Haley <aph at openjdk.org> wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Unused variable > > src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java line 385: > >> 383: try { >> 384: JLA.setScopedValueBindings(newSnapshot); >> 385: JLA.ensureMaterializedForStackWalk(newSnapshot); > > Question: is it necessary here to invoke `ensureMaterializedForStackWalk()` It's really only there to prevent the new `Snapshot` from being scalar replaced. But we know that it cannot be scalar replaced, because it really does escape: a pointer to it is stored in the current `Thread`. So should we simply remove the call to `ensureMaterializedForStackWalk()`, on the grounds that it cannot have any effect? It does seem unnecessary here, but I'm not an expert on current and future C2 escape analysis. @vnkozlov, do you agree? ------------- PR: https://git.openjdk.org/jdk/pull/10952