From duke at openjdk.java.net Wed Dec 1 00:02:14 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 00:02:14 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: > Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. > > 5986.947899319073 MB/s => 24041.05203089616 MB/s > 5840.02689336947 MB/s => 24898.781468710356 MB/s > > ********** Original *********** > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 1.710387358 seconds > CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds > CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > > > > *********** With my changes: ************* > > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 0.425938099 seconds > CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds > CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Adding CRC32-C microbenchmark. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6595/files - new: https://git.openjdk.java.net/jdk/pull/6595/files/10aeaec6..fd87bb92 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=00-01 Stats: 62 lines in 1 file changed: 62 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595 PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Wed Dec 1 00:13:27 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 00:13:27 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 00:02:14 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Adding CRC32-C microbenchmark. Hi, Eric. I added a microbenchmark for CRC32-C. I'm waiting for full completion, but it looks like somewhere around 40GB/s throughput on average. I'll post the results once completed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Wed Dec 1 01:44:31 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 01:44:31 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 00:02:14 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Adding CRC32-C microbenchmark. Benchmark results: Benchmark (count) Mode Cnt Score Error Units TestCRC32C.testCRC32CUpdate 64 avgt 6 0.021 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 128 avgt 6 0.031 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 256 avgt 6 0.023 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 512 avgt 6 0.026 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 1024 avgt 6 0.035 ? 0.002 us/op TestCRC32C.testCRC32CUpdate 2048 avgt 6 0.052 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 4096 avgt 6 0.092 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 8192 avgt 6 0.174 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 16384 avgt 6 0.337 ? 0.001 us/op TestCRC32C.testCRC32CUpdate 32768 avgt 6 0.663 ? 0.002 us/op TestCRC32C.testCRC32CUpdate 65536 avgt 6 1.317 ? 0.004 us/op Finished running test 'micro:java.util.TestCRC32C' ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From sviswanathan at openjdk.java.net Wed Dec 1 02:00:27 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 02:00:27 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: <2IzSrEz0FHnhJDfgbC4vdzW5yIsVCUs_EY8V_AuQyJI=.fe82513c-fe98-49e1-a622-c7805dbdd23e@github.com> On Wed, 1 Dec 2021 00:02:14 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Adding CRC32-C microbenchmark. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6588: > 6586: __ push(y); > 6587: __ push(z); > 6588: #endif a, j, y and z are only required on the crc32c_ipl_alg2_alt2() path, so should be initialized and saved/restored only there. This will also help you to pick a save on call register like c_rarg3 for table without having to push/pop. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From sviswanathan at openjdk.java.net Wed Dec 1 02:06:28 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 02:06:28 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 00:02:14 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Adding CRC32-C microbenchmark. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7218: > 7216: // context for the registers used, where all instructions below are using 128-bit mode > 7217: // On EVEX without VL and BW, these instructions will all be AVX. > 7218: notl(crc); We could do this not1(crc) in generate_updateBytesCRC32() thereby remove the need to do the double notl for crc32c. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From dholmes at openjdk.java.net Wed Dec 1 02:36:27 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 1 Dec 2021 02:36:27 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: <72-LBIFmV82Z0GRieH-cxjyrub8Z-VVC4izGsW_bc4A=.1e021a98-3a4e-4ccc-87a7-c3e01c26d5e5@github.com> On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too @shipilev again I think we need to examine this in terms of impact to our CI. We run different platforms and configurations in different tiers so the costs are not as simple as looking at one run. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From duke at openjdk.java.net Wed Dec 1 02:42:29 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 02:42:29 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 00:02:14 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Adding CRC32-C microbenchmark. Doesn?t this just move the double nots to a different generator? I?m not comfortable with the cost/benefit of this change. I don?t want to impact CRC32 for the sake of CRC32C. I?ll do it if you think it?s worth it. Please let me know. From: sviswa7 ***@***.***> Sent: Tuesday, November 30, 2021 6:04 PM To: openjdk/jdk ***@***.***> Cc: Gibbons, Scott ***@***.***>; Mention ***@***.***> Subject: Re: [openjdk/jdk] 8277358: Accelerate CRC32-C (PR #6595) @sviswa7 commented on this pull request. ________________________________ In src/hotspot/cpu/x86/macroAssembler_x86.cpp: > @@ -7210,7 +7215,6 @@ void MacroAssembler::kernel_crc32_avx512(Register crc, Register buf, Register le // For EVEX with VL and BW, provide a standard mask, VL = 128 will guide the merge // context for the registers used, where all instructions below are using 128-bit mode // On EVEX without VL and BW, these instructions will all be AVX. - lea(key, ExternalAddress(StubRoutines::x86::crc_table_avx512_addr())); notl(crc); We could do this not1(crc) in generate_updateBytesCRC32() thereby remove the need to do the double notl for crc32c. ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Wed Dec 1 02:50:39 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 02:50:39 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v2] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 02:03:29 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding CRC32-C microbenchmark. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 7218: > >> 7216: // context for the registers used, where all instructions below are using 128-bit mode >> 7217: // On EVEX without VL and BW, these instructions will all be AVX. >> 7218: notl(crc); > > We could do this not1(crc) in generate_updateBytesCRC32() thereby remove the need to do the double notl for crc32c. Moved the `notl(crc)` calls to `generate_updateBytesCRC32()` as requested. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6588: > >> 6586: __ push(y); >> 6587: __ push(z); >> 6588: #endif > > a, j, y and z are only required on the crc32c_ipl_alg2_alt2() path, so should be initialized and saved/restored only there. > This will also help you to pick a save on call register like c_rarg3 for table without having to push/pop. Done. Changed to use `j` instead of save/restore of `r14`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Wed Dec 1 02:56:06 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Wed, 1 Dec 2021 02:56:06 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v3] In-Reply-To: References: Message-ID: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> > Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. > > 5986.947899319073 MB/s => 24041.05203089616 MB/s > 5840.02689336947 MB/s => 24898.781468710356 MB/s > > ********** Original *********** > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 1.710387358 seconds > CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds > CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > > > > *********** With my changes: ************* > > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 0.425938099 seconds > CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds > CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fixing review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6595/files - new: https://git.openjdk.java.net/jdk/pull/6595/files/fd87bb92..92b4b9fc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=01-02 Stats: 32 lines in 2 files changed: 11 ins; 16 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595 PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Wed Dec 1 03:08:26 2021 From: duke at openjdk.java.net (xpbob) Date: Wed, 1 Dec 2021 03:08:26 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:34:12 GMT, Erik Gahlin wrote: > What about overhead (if JFR is disabled)? > > This looks like it could be a hot path for some applications. Thanks **result:** |jdk|cores| |-|-| |disable jfr(--disable-jvm-feature-jfr)|3.032 ? 0.059| |remove unsafe event|2.938 ? 0.071| |unsafe event|2.938 ? 0.065| **env:** os: mac os mem: 16g cpu:2.6 GHz 6core Intel Core i7 framework:jmh **testcode:** @BenchmarkMode(Mode.Throughput) @Warmup(iterations = 3) @Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS) @Threads(8) @Fork(2) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class UnsafeTest { @Benchmark public void mallocAndFree() throws Exception { Field field = Unsafe.class.getDeclaredField("theUnsafe"); field.setAccessible(true); Unsafe unsafe = (Unsafe) field.get(null); for (int i = 0; i < 10000; i++) { long l = unsafe.allocateMemory(1000); unsafe.freeMemory(l); } } } ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From sviswanathan at openjdk.java.net Wed Dec 1 03:22:29 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 03:22:29 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> References: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> Message-ID: <40r83G1b4hCgLKRXjIYAR66Ie7MkXj2mAAOe1_XkyJc=.c6255006-6adf-41f2-816b-548eae911090@github.com> On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes wrote: >> @dholmes-ora I have implemented your review comments. > > Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks. @dholmes-ora @neliasso Please do approve the patch if it looks ok to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Wed Dec 1 03:41:30 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Dec 2021 03:41:30 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 06:05:48 GMT, Jie Fu wrote: >> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. > >> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. > > It would be better to add a jmh test for this opt. > Thanks. > @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. So how about posting the detailed perf data before and after this patch? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From duke at openjdk.java.net Wed Dec 1 03:57:36 2021 From: duke at openjdk.java.net (duke) Date: Wed, 1 Dec 2021 03:57:36 GMT Subject: Withdrawn: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations In-Reply-To: References: Message-ID: <2d_WMxO5r1d9GBXIK50u9wQUeIBEWeeu8_rnDMU1_g8=.13dd278a-5f98-47a3-92e3-bd7eb6642ff8@github.com> On Tue, 17 Aug 2021 00:20:13 GMT, John Tortugo wrote: > Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: > > - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. > - Add more default IR regex's to IR-based test framework. > - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. > - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. > - New JTREG "ir_transformations" test group under test/hotspot/jtreg. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From sviswanathan at openjdk.java.net Wed Dec 1 04:14:27 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 04:14:27 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 03:38:00 GMT, Jie Fu wrote: > > @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. > > So how about posting the detailed perf data before and after this patch? Thanks. Before: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 19.538 ? 0.073 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 20.513 ? 0.104 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 15.919 ? 0.652 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.669 ? 0.359 ns/op After: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 16.957 ? 0.584 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 17.221 ? 0.036 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 12.952 ? 0.068 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 13.562 ? 0.124 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Wed Dec 1 07:11:27 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Dec 2021 07:11:27 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: Message-ID: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> On Wed, 1 Dec 2021 03:38:00 GMT, Jie Fu wrote: >>> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. >> >> It would be better to add a jmh test for this opt. >> Thanks. > >> @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. > > So how about posting the detailed perf data before and after this patch? > Thanks. > > > @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. > > > > > > So how about posting the detailed perf data before and after this patch? Thanks. > > Before: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 19.538 ? 0.073 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 20.513 ? 0.104 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 15.919 ? 0.652 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.669 ? 0.359 ns/op > > After: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 16.957 ? 0.584 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 17.221 ? 0.036 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 12.952 ? 0.068 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 13.562 ? 0.124 ns/op Thanks @sviswa7 for your sharing. So the performance number looks good on Intel's latest AVX512 platform. We don't use the 64-byte instructions as default on Intel's old AVX512 platforms, right? If so, is it possible a performance regression for the old platforms after this patch? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From shade at openjdk.java.net Wed Dec 1 08:23:31 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Dec 2021 08:23:31 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: <72-LBIFmV82Z0GRieH-cxjyrub8Z-VVC4izGsW_bc4A=.1e021a98-3a4e-4ccc-87a7-c3e01c26d5e5@github.com> References: <72-LBIFmV82Z0GRieH-cxjyrub8Z-VVC4izGsW_bc4A=.1e021a98-3a4e-4ccc-87a7-c3e01c26d5e5@github.com> Message-ID: On Wed, 1 Dec 2021 02:33:18 GMT, David Holmes wrote: > @shipilev again I think we need to examine this in terms of impact to our CI. We run different platforms and configurations in different tiers so the costs are not as simple as looking at one run. Again, I can wait for those who have more insight in Oracle testing pipelines check their workflows with this change. I have no insight in Oracle infra, so somebody else have to do it. Now that Igor left, who should we be talking to? ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From duke at openjdk.java.net Wed Dec 1 09:06:29 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Wed, 1 Dec 2021 09:06:29 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Tue, 30 Nov 2021 16:12:57 GMT, Thomas Schatzl wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename BOTConstants > > src/hotspot/share/gc/parallel/objectStartArray.hpp line 52: > >> 50: static uint _block_size; >> 51: static uint _block_size_in_words; >> 52: > > Almost the same naming issue as in the `BlockOffsetTable/SharedArray`; I would prefer if these members (and getters) here were named similarly to the ones there. > It is true that `ObjectStartArray` and `BlockOffsetTable` are basically the same thing, but any eventual merge is another issue. Shall these be renamed to osa_card_shift? Something like this? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From david.holmes at oracle.com Wed Dec 1 09:11:46 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 1 Dec 2021 19:11:46 +1000 Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> Message-ID: <23d6d736-3559-16fe-f1f4-88efebe4dd41@oracle.com> On 1/12/2021 5:11 pm, Jie Fu wrote: > On Wed, 1 Dec 2021 03:38:00 GMT, Jie Fu wrote: > >>>> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. >>> >>> It would be better to add a jmh test for this opt. >>> Thanks. >> >>> @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. >> >> So how about posting the detailed perf data before and after this patch? >> Thanks. > >>>> @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. >>> >>> >>> So how about posting the detailed perf data before and after this patch? Thanks. >> >> Before: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 19.538 ? 0.073 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 20.513 ? 0.104 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 15.919 ? 0.652 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.669 ? 0.359 ns/op >> >> After: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 16.957 ? 0.584 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 17.221 ? 0.036 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 12.952 ? 0.068 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 13.562 ? 0.124 ns/op > > Thanks @sviswa7 for your sharing. > So the performance number looks good on Intel's latest AVX512 platform. > > We don't use the 64-byte instructions as default on Intel's old AVX512 platforms, right? > If so, is it possible a performance regression for the old platforms after this patch? > Thanks. The old platforms, for which serialize() is not true, will just use AVX3Threshold as they do today. David ---- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6512 > From shade at openjdk.java.net Wed Dec 1 09:13:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Dec 2021 09:13:36 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: References: Message-ID: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 6m37.855s > user 56m23.004s > sys 0m20.148s > > # x86_32 (TR 3970X) > real 11m22.877s > user 168m8.137s > sys 5m7.037s > > # x86_64 (i5-11500) > real 15m55.424s > user 118m0.969s > sys 0m12.039s > > # AArch64 (ThunderX2) > real 4m5.177s > user 32m7.295s > sys 0m19.689s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Separate test group and hooks into hotspot_slow_compiler - Trim down MAX_SIZE and explain the choice ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6594/files - new: https://git.openjdk.java.net/jdk/pull/6594/files/678086a7..da7ed51e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=00-01 Stats: 17 lines in 2 files changed: 12 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594 PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Wed Dec 1 09:13:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Dec 2021 09:13:36 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: On Tue, 30 Nov 2021 20:45:35 GMT, Vladimir Kozlov wrote: >> Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra. > > Please, create separate test group and add it to `hotspot_slow_compiler`. We would not need to change infra settings if more testing is added to this new group later. Done in new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Wed Dec 1 09:13:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Dec 2021 09:13:37 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <9L5CHY8n-6csbW9jfsnXt4pSqnabXH5R7dt2pZFDmdA=.e53d343e-f7fd-46b1-a8af-02dba3fad3ec@github.com> References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> <9L5CHY8n-6csbW9jfsnXt4pSqnabXH5R7dt2pZFDmdA=.e53d343e-f7fd-46b1-a8af-02dba3fad3ec@github.com> Message-ID: On Tue, 30 Nov 2021 21:22:09 GMT, Aleksey Shipilev wrote: >> Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this: >> >> # x86_64 (i5-11500) >> real 41m32.622s >> user 447m19.986s >> sys 0m21.026s >> >> >> Do you know why it takes so much time on it? > > That small machine has very slow memory compared to other ones. The parallelism in stress tests (9 types, 2 forked VMs each) puts that machine on its knees. There is a blurb about that effect here: https://github.com/openjdk/jdk/pull/6594/files#diff-f72fee20a49daaf4e05002372e93f426407ecd429a227393e2ec79e821042c90R40-R47 -- I don't think it would matter much if we trim `MAX_SIZE`, but I'll try tomorrow. Edit: I also remembered that machine also the only AVX-512 capable one in the mix, so the power/frequency mess that AVX-512 is probably does not help, will look into it tomorrow too. All right. `MAX_SIZE` actually makes a lot of difference for that machine. I trimmed it down to 128K to cater for 64K pages, and added some explanation for the choice. See new commit. Also updated the PR body with new timings. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From qingfeng.yy at alibaba-inc.com Wed Dec 1 07:13:07 2021 From: qingfeng.yy at alibaba-inc.com (Yi Yang) Date: Wed, 01 Dec 2021 15:13:07 +0800 Subject: =?UTF-8?B?UmU6IFtFeHRlcm5hbF0gOiBSZTogUkZDIC0gSW1wcm92aW5nIEMyIEVzY2FwZSBBbmFseXNp?= =?UTF-8?B?cw==?= In-Reply-To: <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> References: <20210930140335.648146897@eggemoggin.niobe.net> <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> , <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> Message-ID: <52d9a09f-e492-444d-9668-ddf0049d75e3.qingfeng.yy@alibaba-inc.com> I read the discussion in this thread and aforementioned discussion of PEA by AWS, as well as related papers. I wonder is it a good/feasible direction to implement PEA for C2 according to what Graal compiler does. Actually, I did some exploration on this for personal interest, but I'm not sure if this direction is correct and if it is worth the time to continue. Graal's escape analysis constructs CFG first, then traverses each basic block, each basic block has an associated state set that records the alias and object state. It will clone the state for successor blocks at the control flow split, and merge the state at control flow merge point. For example: Object obj = new Object(); // Replace Allocation with VirtualAllocationNode // CFG split if (...) { return obj; // Find alias, i.e VirtualAllocationNode, still virtual now } else { staticField = obj; // Materialize VirtualAllocationNode, i.e. real allocation return obj; // Proj of materialized VirtualAllocationNode } It's able to solve the most critical control flow problem mentioned in earlier discussion. But it brings some new problems: 0. CFG is constructed during code generation while escape analysis happens during optimization. GCM can construct CFG and schedule nodes into basic blocks, but PhaseCFG::schedule_late can not work without Matcher. 1. Essentially it moves the location of the object allocation. Graal deletes the original allocation node and replaces it with VirtualAllocationNode. This node will become a real allocation where the object needs to be materialized. Can an object(Allocation) be created on demand where it should materialize? I'm worried about this. 2. If we implement PEA for C2 according to what Graal does, should we completely rewrite subsequent optimizations? That would definitely take significant effort as far as I see. PEA is only analysis, any subsequent optimizations such as scalar replacement and lock elimination need to be reimplemented. Because the existing implementation relies on AllocationNode::_non_escape, Graal's PEA uses a completely novel approach. ------------------------------------------------------------------ From:Vladimir Kozlov Send Time:2021 Nov. 12 (Fri.) 04:38 To:Cesar Soares Lucas ; Tobias Hartmann Cc:John Rose ; hotspot-dev at openjdk.java.net ; Brian Stafford ; Martijn Verburg ; "Hohensee, Paul" ; Monica Beckwith ; David Therkelsen Subject:Re: [External] : Re: RFC - Improving C2 Escape Analysis Hi Cesar, On 11/11/21 11:24 AM, Cesar Soares Lucas wrote: > Hi Vladimir, > > Thank you for the feedback and sorry for the delay in getting back to you! > > > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some > > time investigating possible solutions for it but "no cigar". May be we do > > indead need control flow analysis to resolve this. > > Can you elaborate a bit on the approaches you tried and why you didn't like > them? By allocation merges do you mean nested objects like "obj1.obj2.x", > right? Did you try solving both control-flow merge issues and also allocation > merges? I mean control flow merges of allocations, like in your "Code Example 4". I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL case) which would look like one allocation after merge point with different paths for fields initialization. But stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it still don't solve main merge issue of flow-insensitive analysis: https://bugs.openjdk.java.net/browse/JDK-6726999 test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: https://bugs.openjdk.java.net/browse/JDK-8276455 > > > There are 2 test files with small methods for different EA cases I used to > > see how EA works: > > These examples are being very helpful, thank you again! > > > Yes, I think it would be good to have a prototype if you are comfortable to > > work with C2 code already. I proposed small RFEs just for warmup ;) > > I talked with my colleagues and we decided to start the work by trying to fix > the control/data-flow merge issues - *perhaps not for all cases, but at least > for some of them*. Then, based on our experience with this and some > benchmarking we'll decide if we really need flow-sensitive analysis and how to > best approach that. Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print result). > > We'll definitely take a look at the RFEs as we move along! Implementing Stadler > algorithm was just something that crossed my mind initially, it's very likely > the last approach we'd try ... I don't want to bite more than I can chew.. I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work. Regards, Vladimir K > > > Regards, > Cesar > ------------------------------------------------------------------------------------------------------------------------ > *From:* Vladimir Kozlov > *Sent:* October 29, 2021 5:27 PM > *To:* Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler > > *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net > ; Brian Stafford ; Martijn Verburg > ; Hohensee, Paul > *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis > On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: >> Hi Vladimir and Tobias, >> >> >> Sure, here are four examples of EA and/or scalarization failing due to >> >> complicated control/data flow: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&reserved=0 > >> >> >> There are 2 test files with small methods for different EA cases I used to >> >> see how EA works: >> >> >> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >> >> Thank you for the examples, Tobias/Vladimir. This is being very helpful. >> >> >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent >> >> some time investigating possible solutions for it but "no cigar". May be we >> >> do indead need control flow analysis to resolve this. >> >> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My > > Yes. > > To clarify. I investigated solutions in current flow-insensitive EA. > >> first idea to handle these control/data-merge issues was to implement in C2 the >> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al >> PEA paper. Do you think this is reasonable? > > Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. > I proposed small RFEs just for warmup ;) > >> >> >> I am currently looking on iterative EA. Do more EA rounds if we can >> >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and >> >> I have working prototype. >> >> Cool! I'm curious, when do you plan to submit a Pull Request for this? > > I am investigating regressions in some benchmarks. > >> >> >> There is also suggestion from Amazon Java group about "C2 Partial Escape >> >> Analysis" which needs more discussion: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&reserved=0 > >> >> I'd love to hear from them about their experience with these issues and if they >> have any plans to work on this moving forward! I'll ping them on the thread >> that you linked above. > > Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear > any additional information after Vladimir Ivanov replied. > > Regards, > Vladimir K > >> >> >> Regards, >> Cesar >> ------------------------------------------------------------------------------------------------------------------------ >> *From:* Vladimir Kozlov >> *Sent:* October 27, 2021 10:26 AM >> *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler >> >> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >> ; Brian Stafford ; Martijn Verburg >> >> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >> First. Thank you, Cesar, for collecting data about C2 EA shortcomings. >> >> I agree with cases Tobias pointed as possible starting points to improve EA. >> >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for >> it but "no cigar". May be we do indead need control flow analysis to resolve this. >> >> I looked through JBS and found few issues which are not required to write new EA: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&reserved=0 > > >> > >> >> Tobias also has fix prototype for next bug which was not fixed yet: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&reserved=0 > > >> > >> >> Ther are 2 test files with small methods for different EA cases I used to see how EA works: >> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >> >> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for >> known merge issue: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&reserved=0 > > >> > >> >> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was >> proposed by Vladimir Ivanov and I have working prototype. >> >> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&reserved=0 > > >> > >> >> Thanks, >> Vladimir K >> >> On 10/27/21 3:04 AM, Tobias Hartmann wrote: >>> Hi Cesar, >>> >>> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>>> Right. I was suspecting this to be the most critical issue indeed. However, I >>>> didn't know there was a case where "... the object does not escape on any paths >>>> but control flow is too complicated for EA to prove that." Is this an issue >>>> tracked in JBS or perhaps you can show me an example where this happens? >>> >>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >>> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&reserved=0 > > >> > >>> >>> All examples would completely fold with inline types (Valhalla). >>> >>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >>> of the issues you already described. >>> >>> Best regards, >>> Tobias >>> From neliasso at openjdk.java.net Wed Dec 1 09:23:28 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 1 Dec 2021 09:23:28 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> Message-ID: On Wed, 1 Dec 2021 07:08:33 GMT, Jie Fu wrote: > > We don't use the 64-byte instructions as default on Intel's old AVX512 platforms, right? If so, is it possible a performance regression for the old platforms after this patch? Thanks. As I understand it - old AVX512 platforms will continue to work as before. The new case is that new platforms (that have avx_threshold set to 0) will use 64 byte instructions. But it would be nice with some benchmarks that verify that there are no regression on old avx512 hardware. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From neliasso at openjdk.java.net Wed Dec 1 09:29:28 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 1 Dec 2021 09:29:28 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace I am happy with the change but would like to see some benchmarks that verify that there are no regressions - [before/after]*[avx2/old avx512/new avx512]. You have already posted some of them - please complete with the missing ones. ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Wed Dec 1 09:42:30 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Dec 2021 09:42:30 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> Message-ID: <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> On Wed, 1 Dec 2021 09:20:48 GMT, Nils Eliasson wrote: > As I understand it - old AVX512 platforms will continue to work as before. According to @sviswa7 's comments (no cupid bit for the latest ISA), `is_intel_family_core() && supports_serialize()` can't distinguish all the old AVX512 platforms from the latest ones. So I think it may be possible some old AVX512 machines will behave differently after this opt. @sviswa7 , can you further explain what's the difference of the 64-byte instructions between Intel's old and latest AVX512 platforms? Why can't we enable them as default on old platforms? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From duke at openjdk.java.net Wed Dec 1 09:57:51 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 1 Dec 2021 09:57:51 GMT Subject: RFR: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions Message-ID: Mov (from general), incorrectly uses SIMD_Arrangement as the parameter type of the assembler function. However, from Arm ARM [1], it's more precise to use SIMD_RegVariant here. The situation is similar to Mov(to general) [2]. Note that as Mov(to general) is an alias of UMOV, we turn to re-use UMOV encoding for Mov(to general) in this patch. [1] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--from-general---Move-general-purpose-register-to-a-vector-element--an-alias-of-INS--general-- [2] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--to-general---Move-vector-element-to-general-purpose-register--an-alias-of-UMOV- ------------- Commit messages: - 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions Changes: https://git.openjdk.java.net/jdk/pull/6629/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6629&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277619 Stats: 52 lines in 7 files changed: 1 ins; 3 del; 48 mod Patch: https://git.openjdk.java.net/jdk/pull/6629.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6629/head:pull/6629 PR: https://git.openjdk.java.net/jdk/pull/6629 From simonis at openjdk.java.net Wed Dec 1 11:09:55 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 1 Dec 2021 11:09:55 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v11] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters Rebased on top of '8275908: Record null_check traps for calls and array_check traps in the interpreter' - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=10 Stats: 238 lines in 13 files changed: 214 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Wed Dec 1 11:12:34 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 1 Dec 2021 11:12:34 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v10] In-Reply-To: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> References: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> Message-ID: On Thu, 18 Nov 2021 10:21:01 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. I've rebased the PR on top of [JDK-8275908](https://bugs.openjdk.java.net/browse/JDK-8275908) which already adds the required WhiteBox functionality and considerably simplifies the test for this change. Awaiting @vnkozlov review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From tschatzl at openjdk.java.net Wed Dec 1 11:27:28 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 1 Dec 2021 11:27:28 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Wed, 1 Dec 2021 09:03:02 GMT, Vishal Chand wrote: >> src/hotspot/share/gc/parallel/objectStartArray.hpp line 52: >> >>> 50: static uint _block_size; >>> 51: static uint _block_size_in_words; >>> 52: >> >> Almost the same naming issue as in the `BlockOffsetTable/SharedArray`; I would prefer if these members (and getters) here were named similarly to the ones there. >> It is true that `ObjectStartArray` and `BlockOffsetTable` are basically the same thing, but any eventual merge is another issue. > > Shall these be renamed to osa_card_shift? Something like this? Okay, `_osa_card_shift` and so on is also acceptable to me, but just `_card_shift`/`_card_size` would be fine too. We typically do not prefix members with abbreviations of class name they are in; I see now that we have that `_bot` prefix in the other class, for me the `bot` prefix not necessarily refers to the class but to the concept of a block offset table. Maybe just reduce the names in both cases to `_card_size`/`_card_shift`/`_log_card_`? What do others think? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From duke at openjdk.java.net Wed Dec 1 12:09:35 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Wed, 1 Dec 2021 12:09:35 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Wed, 1 Dec 2021 11:20:12 GMT, Thomas Schatzl wrote: >> Shall these be renamed to osa_card_shift? Something like this? > > Okay, `_osa_card_shift` and so on is also acceptable to me, but just `_card_shift`/`_card_size` would be fine too. We typically do not prefix members with abbreviations of class name they are in; I see now that we have that `_bot` prefix in the other class, for me the `bot` prefix not necessarily refers to the class but to the concept of a block offset table. > > Maybe just reduce the names in both cases to `_card_size`/`_card_shift`/`_log_card_`? What do others think? You mean for both blockOffsetTable and objectStartArray, all the relevant members be renamed to _card_size and so on? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From phedlin at openjdk.java.net Wed Dec 1 12:37:41 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 1 Dec 2021 12:37:41 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 Message-ID: Implementation of MD5 intrinsic support for AArch64. Contributed by Ludovic Henry (@luhenry). Speedup measured (in Aurora running Ampere Altra) as follows: openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% Testing tier1-7. ------------- Commit messages: - 8251216: Implement MD5 intrinsics on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/6628/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6628&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251216 Stats: 199 lines in 4 files changed: 193 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6628.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6628/head:pull/6628 PR: https://git.openjdk.java.net/jdk/pull/6628 From dholmes at openjdk.java.net Wed Dec 1 12:44:26 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 1 Dec 2021 12:44:26 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> Message-ID: On Wed, 1 Dec 2021 09:39:18 GMT, Jie Fu wrote: > > As I understand it - old AVX512 platforms will continue to work as before. > > According to @sviswa7 's comments (no cupid bit for the latest ISA), `is_intel_family_core() && supports_serialize()` can't distinguish all the old AVX512 platforms from the latest ones. So I think it may be possible some old AVX512 machines will behave differently after this opt. I do not see such comments. From my previous questions on this it was indicated that any CPU that supports `serialize` has the improved performance. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From aph at openjdk.java.net Wed Dec 1 13:28:28 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 1 Dec 2021 13:28:28 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. MD5 has been proven insecure, and its weaknesses have been exploited in the field. It is disabled in many systems. I am surprised that we are thinking of accelerating it for possible future use, and that we're adding a worse-then-useless crypto algorithm to the AArch64 startup. ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From phedlin at openjdk.java.net Wed Dec 1 13:44:30 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 1 Dec 2021 13:44:30 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. Fair point. But does that also rule out all uses, as long as it's supported. Not all hashes have to be exposed and on the upside, it's rather fast (well, it's also one of its down sides). Who should make the choice to use it or not? ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From jiefu at openjdk.java.net Wed Dec 1 13:54:22 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Dec 2021 13:54:22 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> Message-ID: On Wed, 1 Dec 2021 12:41:14 GMT, David Holmes wrote: > From my previous questions on this it was indicated that any CPU that supports `serialize` has the improved performance. If so, CPUs that don't support `serialize` would behave as before. Then there shouldn't be any performance regression. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From jvernee at openjdk.java.net Wed Dec 1 14:46:29 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 1 Dec 2021 14:46:29 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:04:44 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr I ran some benchmarks as well: http://cr.openjdk.java.net/~jvernee/UnsafeTest.java I see about a 6 ns increase in benchmark times with the new coded added in (regardless of allocation size), which sounds about right. An unsafe allocation and free takes about 90 ns on my machine with the latest JDK, so the regression is ~6%. (I'm not sure if that's worth worrying about, see below). Whether this is a hot path or not: I think most user applications that use these APIs are doing so indirectly through direct ByteBuffers, and most of those that I've seen use memory pools to avoid doing BB allocations (which can be slow due to the need to wait for reference processing for instance). I can't really say if direct calls, or indirect call through other paths than direct ByteBuffers are a hot path in applications, but I'd be surprised if they were, tbh, since generally people seem to think of `malloc` as slow (something I've observed in practice as well), and try to build their own allocators on top if they want more speed. Maybe it's an idea to implement these events in Java code instead? (using the JFR mirror events). I think the overhead should always be zero if the events are disabled in that case, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From sjohanss at openjdk.java.net Wed Dec 1 15:14:24 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 1 Dec 2021 15:14:24 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Tue, 30 Nov 2021 13:39:41 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Rename BOTConstants Nice to see this unified. Some comments below. src/hotspot/share/gc/shared/blockOffsetTable.hpp line 55: > 53: static uint _log_bot_card_size_words; > 54: static uint _bot_card_size_bytes; > 55: static uint _bot_card_size_words; Maybe change `words` to `in_words` to better match other places where we have sizes in words. And as Thomas suggested above, maybe also drop the `bot` in/pre-fix. And also change the getters below. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6570 From sjohanss at openjdk.java.net Wed Dec 1 15:14:24 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 1 Dec 2021 15:14:24 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Wed, 1 Dec 2021 12:06:29 GMT, Vishal Chand wrote: >> Okay, `_osa_card_shift` and so on is also acceptable to me, but just `_card_shift`/`_card_size` would be fine too. We typically do not prefix members with abbreviations of class name they are in; I see now that we have that `_bot` prefix in the other class, for me the `bot` prefix not necessarily refers to the class but to the concept of a block offset table. >> >> Maybe just reduce the names in both cases to `_card_size`/`_card_shift`/`_log_card_`? What do others think? > > You mean for both blockOffsetTable and objectStartArray, all the relevant members be renamed to _card_size and so on? I think this sounds like a good plan. ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From luhenry at openjdk.java.net Wed Dec 1 15:27:29 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 1 Dec 2021 15:27:29 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 13:25:49 GMT, Andrew Haley wrote: > MD5 has been proven insecure, and its weaknesses have been exploited in the field. It is disabled in many systems. I am surprised that we are thinking of accelerating it for possible future use, and that we're adding a worse-then-useless crypto algorithm to the AArch64 startup. I wholeheartedly agree with your take. Unfortunately, it's still used on many systems, like for verifying the integrity of downloads ([Azure Blob Storage](https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.storage.blob.blobproperties.contentmd5?view=azure-dotnet-legacy) for example). ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From redestad at openjdk.java.net Wed Dec 1 15:36:25 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 1 Dec 2021 15:36:25 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. While I think it's good that distributions are free to omit MD5 now, there's still various non-cryptographic uses that warrant continued support and enhancements. Checksumming, JDK APIs such as `UUID.nameUUIDFromBytes`, etc.. Perhaps there should be a build flag to omit all of this, though? For startup it remains a sore point that stubs like these are generated eagerly on bootstrap. I hope we'll be able to make this lazy in the near future ([JDK-8231349](https://bugs.openjdk.java.net/browse/JDK-8231349)) to make adding intrinsics come with fewer trade-offs. This particular stub is very simple and likely adds unnoticably to bootstrap, but in accumulation it's grown to be a bit of a concern in places, especially on x86 with large AVX-512 intrinsics. I'm not sure if there's been any progress on this recently, though. @vnkozlov? (I'm not qualified to review Aarch64 code, but this contribution looks ok to me.) ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From ddong at openjdk.java.net Wed Dec 1 16:05:28 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 1 Dec 2021 16:05:28 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 14:43:50 GMT, Jorn Vernee wrote: > Maybe it's an idea to implement these events in Java code instead? (using the JFR mirror events). I think the overhead should always be zero if the events are disabled in that case, right? In my opinion, these events will be enabled when direct memory leak occurs during application running. In other words, users will not enable them at startup. So if these events are implemented by Java layer, some methods will be deoptimized which causes performance degradation of the application for a period of time if I understand correctly. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From egahlin at openjdk.java.net Wed Dec 1 16:05:28 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Wed, 1 Dec 2021 16:05:28 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 14:43:50 GMT, Jorn Vernee wrote: > I ran some benchmarks as well: http://cr.openjdk.java.net/~jvernee/UnsafeTest.java > > I see about a 6 ns increase in benchmark times with the new coded added in (regardless of allocation size), which sounds about right. An unsafe allocation and free takes about 90 ns on my machine with the latest JDK, so the regression is ~6%. (I'm not sure if that's worth worrying about, see below). > > Whether this is a hot path or not: I think most user applications that use these APIs are doing so indirectly through direct ByteBuffers, and most of those that I've seen use memory pools to avoid doing BB allocations (which can be slow due to the need to wait for reference processing for instance). > > I can't really say if direct calls, or indirect call through other paths than direct ByteBuffers are a hot path in applications, but I'd be surprised if they were, tbh, since generally people seem to think of `malloc` as slow (something I've observed in practice as well), and try to build their own allocators on top if they want more speed. > > Maybe it's an idea to implement these events in Java code instead? (using the JFR mirror events). I think the overhead should always be zero if the events are disabled in that case, right? Thanks, that were the numbers I was looking for! We could do them at zero cost in Java (disabled) if we use bytecode instrumentation. If we use mirror events, it may not always work out due to inlining depth. There is also additional startup cost when using Java events. My worry is that very few people will actually turn these events on, but the rest of the world (99,9999%) will pay the additional 6 ns overhead for every native allocation. We are planning some rewrites for JDK 19 when comes to Java events to reduce the startup cost. It could worth to see if they could be turned into Java events (with zero overhead), similar to the SocketRead/SocketWrite events. I also wonder if the events should be called NativeAllocation, NativeReallocation and NativeFree, so they are not tied so hard to the Unsafe implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From zgu at openjdk.java.net Wed Dec 1 16:08:36 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 1 Dec 2021 16:08:36 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability Message-ID: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. After JDK-8277946, there is no longer a use, so it can be removed. Test: - [x] hotspot_nmt - [x] tier1 with NMT on ------------- Commit messages: - Fix comment - Fix windows - v0 Changes: https://git.openjdk.java.net/jdk/pull/6640/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6640&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277990 Stats: 446 lines in 18 files changed: 31 ins; 349 del; 66 mod Patch: https://git.openjdk.java.net/jdk/pull/6640.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6640/head:pull/6640 PR: https://git.openjdk.java.net/jdk/pull/6640 From aph at openjdk.java.net Wed Dec 1 17:01:20 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 1 Dec 2021 17:01:20 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: <-j7mfpZ2dqvbuODtPI7RYpIX8HQVLwYhuLnGR-LnQN4=.e5119f03-3fd0-47db-94ec-3d240a236f49@github.com> On Wed, 1 Dec 2021 15:24:40 GMT, Ludovic Henry wrote: > > MD5 has been proven insecure, and its weaknesses have been exploited in the field. It is disabled in many systems. I am surprised that we are thinking of accelerating it for possible future use, and that we're adding a worse-then-useless crypto algorithm to the AArch64 startup. > > I wholeheartedly agree with your take. Unfortunately, it's still used on many systems, like for verifying the integrity of downloads ([Azure Blob Storage](https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.storage.blob.blobproperties.contentmd5?view=azure-dotnet-legacy) for example). Ha! ?? OK. This seems like a really weird time to be adding MD5 support, almost four years after MD5 was disabled for jarfile signing, and 15 years after the first practical break. But I guess it's harmless enough, even though I hate having to carry such baggage around. ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From aph at openjdk.java.net Wed Dec 1 18:19:28 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 1 Dec 2021 18:19:28 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: <7FVlxJ3bpkRZS43P7VdQJzXh-p7fh7xIOtRo56Mf2qg=.7da106b9-4638-4ff4-a717-5ecf9c4bc2d2@github.com> On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From sviswanathan at openjdk.java.net Wed Dec 1 18:44:30 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 18:44:30 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems. The additional performance numbers with the patch requested by Nils are as below: Old AVX512 Before: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 18.650 ? 2.773 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 20.241 ? 1.398 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 16.252 ? 0.076 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.965 ? 0.172 ns/op After: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 17.701 ? 2.623 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 20.588 ? 0.775 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 16.219 ? 0.066 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.937 ? 0.185 ns/op AVX2 Before: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 23.801 ? 0.090 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 24.376 ? 0.867 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 14.015 ? 0.016 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.355 ? 0.024 ns/op After: Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 23.373 ? 0.629 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 24.390 ? 0.875 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 13.995 ? 0.056 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 15.383 ? 0.051 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From vladimir.kozlov at oracle.com Wed Dec 1 19:35:33 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Dec 2021 11:35:33 -0800 Subject: [External] : Re: RFC - Improving C2 Escape Analysis In-Reply-To: <52d9a09f-e492-444d-9668-ddf0049d75e3.qingfeng.yy@alibaba-inc.com> References: <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> <52d9a09f-e492-444d-9668-ddf0049d75e3.qingfeng.yy@alibaba-inc.com> Message-ID: Thank you, Yang, for you input. We never planned "implement PEA for C2 according to what Graal compiler does". As you correctly pointed it would require a lot of changes incompatible with current C2. As I understand, Cesar and Co are also looking for a solution for merge cases first without drastically changing C2. I think we have enough info (or we can add more if needed) in C2 generated CFG we can consult to decide if some EA transformation/optimization are safe without drastically rewrite C2. We just need to know a correct field's value after merge which current EA. We are not looking on the case you pointed yet. May be we can create something similar to SafePointScalarObject node for such purpose. Thanks, Vladimir K On 11/30/21 11:13 PM, Yi Yang wrote: > I read the discussion in this thread and aforementioned discussion of PEA by AWS, as well as related papers. I wonder is it a good/feasible direction to implement PEA for C2 according to what Graal compiler does. Actually, I did some exploration on this for personal interest, but I'm not sure if this direction is correct and if it is worth the time to continue. > > Graal's escape analysis constructs CFG first, then traverses each basic block, each basic block has an associated state set that records the alias and object state. It will clone the state for successor blocks at the control flow split, and merge the state at control flow merge point. > > For example: > Object obj = new Object(); // Replace Allocation with VirtualAllocationNode > // CFG split > if (...) { > return obj; // Find alias, i.e VirtualAllocationNode, still virtual now > } else { > staticField = obj; // Materialize VirtualAllocationNode, i.e. real allocation > return obj; // Proj of materialized VirtualAllocationNode > } > It's able to solve the most critical control flow problem mentioned in earlier discussion. But it brings some new problems: > > 0. CFG is constructed during code generation while escape analysis happens during optimization. GCM can construct CFG and schedule nodes into basic blocks, but PhaseCFG::schedule_late can not work without Matcher. > > 1. Essentially it moves the location of the object allocation. Graal deletes the original allocation node and replaces it with VirtualAllocationNode. This node will become a real allocation where the object needs to be materialized. Can an object(Allocation) be created on demand where it should materialize? I'm worried about this. > > 2. If we implement PEA for C2 according to what Graal does, should we completely rewrite subsequent optimizations? That would definitely take significant effort as far as I see. PEA is only analysis, any subsequent optimizations such as scalar replacement and lock elimination need to be reimplemented. Because the existing implementation relies on AllocationNode::_non_escape, Graal's PEA uses a completely novel approach. > > > ------------------------------------------------------------------ > From:Vladimir Kozlov > Send Time:2021 Nov. 12 (Fri.) 04:38 > To:Cesar Soares Lucas ; Tobias Hartmann > Cc:John Rose ; hotspot-dev at openjdk.java.net ; Brian Stafford ; Martijn Verburg ; "Hohensee, Paul" ; Monica Beckwith ; David Therkelsen > Subject:Re: [External] : Re: RFC - Improving C2 Escape Analysis > > Hi Cesar, > > On 11/11/21 11:24 AM, Cesar Soares Lucas wrote: >> Hi Vladimir, >> >> Thank you for the feedback and sorry for the delay in getting back to you! >> >> > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some >> > time investigating possible solutions for it but "no cigar". May be we do >> > indead need control flow analysis to resolve this. >> >> Can you elaborate a bit on the approaches you tried and why you didn't like >> them? By allocation merges do you mean nested objects like "obj1.obj2.x", >> right? Did you try solving both control-flow merge issues and also allocation >> merges? > > I mean control flow merges of allocations, like in your "Code Example 4". > > I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL > case) which would look like one allocation after merge point with different paths for fields initialization. But > stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it > still don't solve main merge issue of flow-insensitive analysis: > > https://bugs.openjdk.java.net/browse/JDK-6726999 > test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java > > The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: > https://bugs.openjdk.java.net/browse/JDK-8276455 > >> >> > There are 2 test files with small methods for different EA cases I used to >> > see how EA works: >> >> These examples are being very helpful, thank you again! > >> > Yes, I think it would be good to have a prototype if you are comfortable to >> > work with C2 code already. I proposed small RFEs just for warmup ;) >> >> I talked with my colleagues and we decided to start the work by trying to fix >> the control/data-flow merge issues - *perhaps not for all cases, but at least >> for some of them*. Then, based on our experience with this and some >> benchmarking we'll decide if we really need flow-sensitive analysis and how to >> best approach that. > > Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print result). > >> >> We'll definitely take a look at the RFEs as we move along! Implementing Stadler >> algorithm was just something that crossed my mind initially, it's very likely >> the last approach we'd try ... I don't want to bite more than I can chew.. > > I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work. > > Regards, > Vladimir K > >> >> >> Regards, >> Cesar >> ------------------------------------------------------------------------------------------------------------------------ >> *From:* Vladimir Kozlov >> *Sent:* October 29, 2021 5:27 PM >> *To:* Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler >> >> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >> ; Brian Stafford ; Martijn Verburg >> ; Hohensee, Paul >> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >> On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: >>> Hi Vladimir and Tobias, >>> >>> >> Sure, here are four examples of EA and/or scalarization failing due to >>> >> complicated control/data flow: >>> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&reserved=0 >> >>> >>> >> There are 2 test files with small methods for different EA cases I used to >>> >> see how EA works: >>> >> >>> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>> >>> Thank you for the examples, Tobias/Vladimir. This is being very helpful. >>> >>> >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent >>> >> some time investigating possible solutions for it but "no cigar". May be we >>> >> do indead need control flow analysis to resolve this. >>> >>> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My >> >> Yes. >> >> To clarify. I investigated solutions in current flow-insensitive EA. >> >>> first idea to handle these control/data-merge issues was to implement in C2 the >>> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al >>> PEA paper. Do you think this is reasonable? >> >> Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. >> I proposed small RFEs just for warmup ;) >> >>> >>> >> I am currently looking on iterative EA. Do more EA rounds if we can >>> >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and >>> >> I have working prototype. >>> >>> Cool! I'm curious, when do you plan to submit a Pull Request for this? >> >> I am investigating regressions in some benchmarks. >> >>> >>> >> There is also suggestion from Amazon Java group about "C2 Partial Escape >>> >> Analysis" which needs more discussion: >>> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&reserved=0 >> >>> >>> I'd love to hear from them about their experience with these issues and if they >>> have any plans to work on this moving forward! I'll ping them on the thread >>> that you linked above. >> >> Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear >> any additional information after Vladimir Ivanov replied. >> >> Regards, >> Vladimir K >> >>> >>> >>> Regards, >>> Cesar >>> ------------------------------------------------------------------------------------------------------------------------ >>> *From:* Vladimir Kozlov >>> *Sent:* October 27, 2021 10:26 AM >>> *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler >>> >>> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >>> ; Brian Stafford ; Martijn Verburg >>> >>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >>> First. Thank you, Cesar, for collecting data about C2 EA shortcomings. >>> >>> I agree with cases Tobias pointed as possible starting points to improve EA. >>> >>> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for >>> it but "no cigar". May be we do indead need control flow analysis to resolve this. >>> >>> I looked through JBS and found few issues which are not required to write new EA: >>> >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&reserved=0 >> >> >>> > > >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&reserved=0 >> >> >>> > > >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&reserved=0 >> >> >>> > > >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&reserved=0 >> >> >>> > > >>> >>> Tobias also has fix prototype for next bug which was not fixed yet: >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&reserved=0 >> >> >>> > > >>> >>> Ther are 2 test files with small methods for different EA cases I used to see how EA works: >>> >>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>> >>> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for >>> known merge issue: >>> >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&reserved=0 >> >> >>> > > >>> >>> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was >>> proposed by Vladimir Ivanov and I have working prototype. >>> >>> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&reserved=0 >> >> >>> > > >>> >>> Thanks, >>> Vladimir K >>> >>> On 10/27/21 3:04 AM, Tobias Hartmann wrote: >>>> Hi Cesar, >>>> >>>> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>>>> Right. I was suspecting this to be the most critical issue indeed. However, I >>>>> didn't know there was a case where "... the object does not escape on any paths >>>>> but control flow is too complicated for EA to prove that." Is this an issue >>>>> tracked in JBS or perhaps you can show me an example where this happens? >>>> >>>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >>>> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&reserved=0 >> >> >>> > > >>>> >>>> All examples would completely fold with inline types (Valhalla). >>>> >>>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >>>> of the issues you already described. >>>> >>>> Best regards, >>>> Tobias >>>> From vladimir.kozlov at oracle.com Wed Dec 1 20:15:21 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Dec 2021 12:15:21 -0800 Subject: [External] : Re: RFC - Improving C2 Escape Analysis In-Reply-To: References: <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> <52d9a09f-e492-444d-9668-ddf0049d75e3.qingfeng.yy@alibaba-inc.com> Message-ID: <54107b06-90b4-c7e5-2d46-79a6732e9dba@oracle.com> I did not finish sentence: > We just need to know correct field's value after merge which current EA. which current EA does not provide. Thanks, Vladimir K On 12/1/21 11:35 AM, Vladimir Kozlov wrote: > Thank you, Yang, for your input. > > We never planned "implement PEA for C2 according to what Graal compiler does". As you correctly pointed it would require > a lot of changes incompatible with current C2. > > As I understand, Cesar and Co are also looking for a solution for merge cases first without drastically changing C2. > > I think we have enough info (or we can add more if needed) in C2 generated CFG we can consult to decide if some EA > transformation/optimization are safe without drastically rewriting C2. We just need to know correct field's value after > merge which current EA. > > We are not looking on the case you pointed yet. May be we can create something similar to SafePointScalarObject node for > such purpose. > > Thanks, > Vladimir K > > On 11/30/21 11:13 PM, Yi Yang wrote: >> I read the discussion in this thread and aforementioned discussion of PEA by AWS, as well as related papers. I wonder >> is it a good/feasible direction to implement PEA for C2 according to what Graal compiler does. Actually, I did some >> exploration on this for personal interest, but I'm not sure if this direction is correct and if it is worth the time >> to continue. >> >> Graal's escape analysis constructs CFG first, then traverses each basic block, each basic block has an associated >> state set that records the alias and object state. It will clone the state for successor blocks at the control flow >> split, and merge the state at control flow merge point. >> >> For example: >> Object obj = new Object(); // Replace Allocation with VirtualAllocationNode >> ???????????????????????????????????????? // CFG split >> if (...) { >> ? return obj;?????????????????????? // Find alias, i.e VirtualAllocationNode, still virtual now >> } else { >> ? staticField = obj;???????????? // Materialize VirtualAllocationNode, i.e. real allocation >> ? return obj;????????????????????? // Proj of materialized VirtualAllocationNode >> } >> It's able to solve the most critical control flow problem mentioned in earlier discussion. But it brings some new >> problems: >> >> 0. CFG is constructed during code generation while escape analysis happens during optimization. GCM can construct CFG >> and schedule nodes into basic blocks, but PhaseCFG::schedule_late can not work without Matcher. >> >> 1. Essentially it moves the location of the object allocation. Graal deletes the original allocation node and replaces >> it with VirtualAllocationNode. This node will become a real allocation where the object needs to be materialized. Can >> an object(Allocation) be created on demand where it should materialize? I'm worried about this. >> >> 2. If we implement PEA for C2 according to what Graal does, should we completely rewrite subsequent optimizations? >> That would definitely take significant effort as far as I see. PEA is only analysis, any subsequent optimizations such >> as scalar replacement and lock elimination need to be reimplemented. Because the existing implementation relies on >> AllocationNode::_non_escape, Graal's PEA uses a completely novel approach. >> >> >> ------------------------------------------------------------------ >> From:Vladimir Kozlov >> Send Time:2021 Nov. 12 (Fri.) 04:38 >> To:Cesar Soares Lucas ; Tobias Hartmann >> Cc:John Rose ; hotspot-dev at openjdk.java.net ; Brian Stafford >> ; Martijn Verburg ; "Hohensee, Paul" >> ; Monica Beckwith ; David Therkelsen >> Subject:Re: [External] : Re: RFC - Improving C2 Escape Analysis >> >> Hi Cesar, >> >> On 11/11/21 11:24 AM, Cesar Soares Lucas wrote: >>> Hi Vladimir, >>> >>> Thank you for the feedback and sorry for the delay in getting back to you! >>> >>> ? > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some >>> ? > time investigating possible solutions for it but "no cigar". May be we do >>> ? > indead need control flow analysis to resolve this. >>> >>> Can you elaborate a bit on the approaches you tried and why you didn't like >>> them? By allocation merges do you mean nested objects like "obj1.obj2.x", >>> right? Did you try solving both control-flow merge issues and also allocation >>> merges? >> >> I mean control flow merges of allocations, like in your "Code Example 4". >> >> I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL >> case) which would look like one allocation after merge point with different paths for fields initialization. But >> stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it >> still don't solve main merge issue of flow-insensitive analysis: >> >> https://bugs.openjdk.java.net/browse/JDK-6726999 >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >> >> The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: >> https://bugs.openjdk.java.net/browse/JDK-8276455 >> >>> >>> ? > There are 2 test files with small methods for different EA cases I used to >>> ? > see how EA works: >>> >>> These examples are being very helpful, thank you again! > >>> ? > Yes, I think it would be good to have a prototype if you are comfortable to >>> ? > work with C2 code already.? I proposed small RFEs just for warmup ;) >>> >>> I talked with my colleagues and we decided to start the work by trying to fix >>> the control/data-flow merge issues - *perhaps not for all cases, but at least >>> for some of them*. Then, based on our experience with this and some >>> benchmarking we'll decide if we really need flow-sensitive analysis and how to >>> best approach that. >> >> Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print >> result). >> >>> >>> We'll definitely take a look at the RFEs as we move along! Implementing Stadler >>> algorithm was just something that crossed my mind initially, it's very likely >>> the last approach we'd try ... I don't want to bite more than I can chew.. >> >> I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work. >> >> Regards, >> Vladimir K >> >>> >>> >>> Regards, >>> Cesar >>> ------------------------------------------------------------------------------------------------------------------------ >>> *From:* Vladimir Kozlov >>> *Sent:* October 29, 2021 5:27 PM >>> *To:* Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler >>> >>> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >>> ; Brian Stafford ; Martijn Verburg >>> ; Hohensee, Paul >>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >>> On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: >>>> Hi Vladimir and Tobias, >>>> >>>> ? >> Sure, here are four examples of EA and/or scalarization failing due to >>>> ? >> complicated control/data flow: >>>> ? >> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&reserved=0 >>>> >>> >>> >>>> >>>> ? >> There are 2 test files with small methods for different EA cases I used to >>>> ? >> see how EA works: >>>> ? >> >>>> ? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>>> ? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>>> >>>> Thank you for the examples, Tobias/Vladimir. This is being very helpful. >>>> >>>> ? >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent >>>> ? >> some time investigating possible solutions for it but "no cigar". May be we >>>> ? >> do indead need control flow analysis to resolve this. >>>> >>>> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My >>> >>> Yes. >>> >>> To clarify. I investigated solutions in current flow-insensitive EA. >>> >>>> first idea to handle these control/data-merge issues was to implement in C2 the >>>> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al >>>> PEA paper. Do you think this is reasonable? >>> >>> Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. >>> I proposed small RFEs just for warmup ;) >>> >>>> >>>> ? >> I am currently looking on iterative EA. Do more EA rounds if we can >>>> ? >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and >>>> ? >> I have working prototype. >>>> >>>> Cool! I'm curious, when do you plan to submit a Pull Request for this? >>> >>> I am investigating regressions in some benchmarks. >>> >>>> >>>> ? >> There is also suggestion from Amazon Java group about "C2 Partial Escape >>>> ? >> Analysis" which needs more discussion: >>>> ? >> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&reserved=0 >>>> >>> >>> >>>> >>>> I'd love to hear from them about their experience with these issues and if they >>>> have any plans to work on this moving forward! I'll ping them on the thread >>>> that you linked above. >>> >>> Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear >>> any additional information after Vladimir Ivanov replied. >>> >>> Regards, >>> Vladimir K >>> >>>> >>>> >>>> Regards, >>>> Cesar >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> *From:* Vladimir Kozlov >>>> *Sent:* October 27, 2021 10:26 AM >>>> *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler >>>> >>>> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >>>> ; Brian Stafford ; Martijn Verburg >>>> >>>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >>>> First. Thank you, Cesar, for collecting data about C2 EA shortcomings. >>>> >>>> I agree with cases Tobias pointed as possible starting points to improve EA. >>>> >>>> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for >>>> it but "no cigar". May be we do indead need control flow analysis to resolve this. >>>> >>>> I looked through JBS and found few issues which are not required to write new EA: >>>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Tobias also has fix prototype for next bug which was not fixed yet: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Ther are 2 test files with small methods for different EA cases I used to see how EA works: >>>> >>>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>>> >>>> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for >>>> known merge issue: >>>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was >>>> proposed by Vladimir Ivanov and I have working prototype. >>>> >>>> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>> On 10/27/21 3:04 AM, Tobias Hartmann wrote: >>>>> Hi Cesar, >>>>> >>>>> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>>>>> Right. I was suspecting this to be the most critical issue indeed. However, I >>>>>> didn't know there was a case where "... the object does not escape on any paths >>>>>> but control flow is too complicated for EA to prove that." Is this an issue >>>>>> tracked in JBS or perhaps you can show me an example where this happens? >>>>> >>>>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >>>>> flow: >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&reserved=0 >>>>> >>> >>> >>> >>>> >>> >>> > >>> >>>>> >>>>> All examples would completely fold with inline types (Valhalla). >>>>> >>>>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >>>>> of the issues you already described. >>>>> >>>>> Best regards, >>>>> Tobias >>>>> From kvn at openjdk.java.net Wed Dec 1 20:30:30 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Dec 2021 20:30:30 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v3] In-Reply-To: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> References: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> Message-ID: On Wed, 1 Dec 2021 02:56:06 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments Nice work. Let me test it before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From kvn at openjdk.java.net Wed Dec 1 20:59:35 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Dec 2021 20:59:35 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: <72-LBIFmV82Z0GRieH-cxjyrub8Z-VVC4izGsW_bc4A=.1e021a98-3a4e-4ccc-87a7-c3e01c26d5e5@github.com> Message-ID: On Wed, 1 Dec 2021 08:20:22 GMT, Aleksey Shipilev wrote: > > @shipilev again I think we need to examine this in terms of impact to our CI. We run different platforms and configurations in different tiers so the costs are not as simple as looking at one run. > > Again, I can wait for those who have more insight in Oracle testing pipelines check their workflows with this change. I have no insight in Oracle infra, so somebody else have to do it. Now that Igor left, who should we be talking to? @dholmes-ora I checked and this change does not interfere with our CI. `tier2` and `tier3` introduced by #5241 are not used by our CI. New `tier2_compiler` and `tier3_compiler` groups are also not used. We use different sets in CI. I am not sure how else it can affect our testing. I also submitted our testing. I will let you know results. ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From iklam at openjdk.java.net Wed Dec 1 21:02:48 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 1 Dec 2021 21:02:48 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime Message-ID: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> **Background:** In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: public enum Day { SUNDAY, MONDAY ... } to public class Day extends java.lang.Enum { public static final SUNDAY = new Day("SUNDAY"); public static final MONDAY = new Day("MONDAY"); ... } With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) **Fix:** During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. **Verification:** To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt **Testing:** Passed Oracle CI tiers 1-4. WIll run tier 5 as well. ------------- Commit messages: - 8275731: CDS archived enums objects are recreated at runtime Changes: https://git.openjdk.java.net/jdk/pull/6653/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275731 Stats: 829 lines in 16 files changed: 787 ins; 2 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/6653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6653/head:pull/6653 PR: https://git.openjdk.java.net/jdk/pull/6653 From Divino.Cesar at microsoft.com Wed Dec 1 21:03:15 2021 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Wed, 1 Dec 2021 21:03:15 +0000 Subject: [External] : Re: RFC - Improving C2 Escape Analysis In-Reply-To: <54107b06-90b4-c7e5-2d46-79a6732e9dba@oracle.com> References: <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> <52d9a09f-e492-444d-9668-ddf0049d75e3.qingfeng.yy@alibaba-inc.com> <54107b06-90b4-c7e5-2d46-79a6732e9dba@oracle.com> Message-ID: Thank you, Vladimir and Yang, for the feedback. I just want to reiterate my plan of work on this in order to improve clarity. We're looking into a solution for allocation merges at the moment; we'll probably be looking into bringing some form of flow-sensitive EA later on after we get some progress with allocation merges. We don't have plans to implement Stadler/GRAAL EA algorithm in C2 at the moment. Thanks, Cesar PS: Sorry for the delay... I'm in the middle of a leave. ________________________________ From: Vladimir Kozlov Sent: December 1, 2021 12:15 PM To: Yi Yang ; Cesar Soares Lucas ; Tobias Hartmann Cc: John Rose ; hotspot-dev at openjdk.java.net ; Brian Stafford ; Martijn Verburg ; Hohensee, Paul ; Monica Beckwith ; David Therkelsen Subject: Re: [External] : Re: RFC - Improving C2 Escape Analysis I did not finish sentence: > We just need to know correct field's value after merge which current EA. which current EA does not provide. Thanks, Vladimir K On 12/1/21 11:35 AM, Vladimir Kozlov wrote: > Thank you, Yang, for your input. > > We never planned "implement PEA for C2 according to what Graal compiler does". As you correctly pointed it would require > a lot of changes incompatible with current C2. > > As I understand, Cesar and Co are also looking for a solution for merge cases first without drastically changing C2. > > I think we have enough info (or we can add more if needed) in C2 generated CFG we can consult to decide if some EA > transformation/optimization are safe without drastically rewriting C2. We just need to know correct field's value after > merge which current EA. > > We are not looking on the case you pointed yet. May be we can create something similar to SafePointScalarObject node for > such purpose. > > Thanks, > Vladimir K > > On 11/30/21 11:13 PM, Yi Yang wrote: >> I read the discussion in this thread and aforementioned discussion of PEA by AWS, as well as related papers. I wonder >> is it a good/feasible direction to implement PEA for C2 according to what Graal compiler does. Actually, I did some >> exploration on this for personal interest, but I'm not sure if this direction is correct and if it is worth the time >> to continue. >> >> Graal's escape analysis constructs CFG first, then traverses each basic block, each basic block has an associated >> state set that records the alias and object state. It will clone the state for successor blocks at the control flow >> split, and merge the state at control flow merge point. >> >> For example: >> Object obj = new Object(); // Replace Allocation with VirtualAllocationNode >> // CFG split >> if (...) { >> return obj; // Find alias, i.e VirtualAllocationNode, still virtual now >> } else { >> staticField = obj; // Materialize VirtualAllocationNode, i.e. real allocation >> return obj; // Proj of materialized VirtualAllocationNode >> } >> It's able to solve the most critical control flow problem mentioned in earlier discussion. But it brings some new >> problems: >> >> 0. CFG is constructed during code generation while escape analysis happens during optimization. GCM can construct CFG >> and schedule nodes into basic blocks, but PhaseCFG::schedule_late can not work without Matcher. >> >> 1. Essentially it moves the location of the object allocation. Graal deletes the original allocation node and replaces >> it with VirtualAllocationNode. This node will become a real allocation where the object needs to be materialized. Can >> an object(Allocation) be created on demand where it should materialize? I'm worried about this. >> >> 2. If we implement PEA for C2 according to what Graal does, should we completely rewrite subsequent optimizations? >> That would definitely take significant effort as far as I see. PEA is only analysis, any subsequent optimizations such >> as scalar replacement and lock elimination need to be reimplemented. Because the existing implementation relies on >> AllocationNode::_non_escape, Graal's PEA uses a completely novel approach. >> >> >> ------------------------------------------------------------------ >> From:Vladimir Kozlov >> Send Time:2021 Nov. 12 (Fri.) 04:38 >> To:Cesar Soares Lucas ; Tobias Hartmann >> Cc:John Rose ; hotspot-dev at openjdk.java.net ; Brian Stafford >> ; Martijn Verburg ; "Hohensee, Paul" >> ; Monica Beckwith ; David Therkelsen >> Subject:Re: [External] : Re: RFC - Improving C2 Escape Analysis >> >> Hi Cesar, >> >> On 11/11/21 11:24 AM, Cesar Soares Lucas wrote: >>> Hi Vladimir, >>> >>> Thank you for the feedback and sorry for the delay in getting back to you! >>> >>> > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some >>> > time investigating possible solutions for it but "no cigar". May be we do >>> > indead need control flow analysis to resolve this. >>> >>> Can you elaborate a bit on the approaches you tried and why you didn't like >>> them? By allocation merges do you mean nested objects like "obj1.obj2.x", >>> right? Did you try solving both control-flow merge issues and also allocation >>> merges? >> >> I mean control flow merges of allocations, like in your "Code Example 4". >> >> I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL >> case) which would look like one allocation after merge point with different paths for fields initialization. But >> stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it >> still don't solve main merge issue of flow-insensitive analysis: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6726999&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KFWvTg8NgtJZN%2FlGAJdJIGsRptJfUWG%2BSXpmgbkCAhI%3D&reserved=0 >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >> >> The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8276455&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Dwr4424EmCOQNq6%2BCCpuF7h3JZ929Jo8AeGyf5%2BQcjQ%3D&reserved=0 >> >>> >>> > There are 2 test files with small methods for different EA cases I used to >>> > see how EA works: >>> >>> These examples are being very helpful, thank you again! > >>> > Yes, I think it would be good to have a prototype if you are comfortable to >>> > work with C2 code already. I proposed small RFEs just for warmup ;) >>> >>> I talked with my colleagues and we decided to start the work by trying to fix >>> the control/data-flow merge issues - *perhaps not for all cases, but at least >>> for some of them*. Then, based on our experience with this and some >>> benchmarking we'll decide if we really need flow-sensitive analysis and how to >>> best approach that. >> >> Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print >> result). >> >>> >>> We'll definitely take a look at the RFEs as we move along! Implementing Stadler >>> algorithm was just something that crossed my mind initially, it's very likely >>> the last approach we'd try ... I don't want to bite more than I can chew.. >> >> I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work. >> >> Regards, >> Vladimir K >> >>> >>> >>> Regards, >>> Cesar >>> ------------------------------------------------------------------------------------------------------------------------ >>> *From:* Vladimir Kozlov >>> *Sent:* October 29, 2021 5:27 PM >>> *To:* Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler >>> >>> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >>> ; Brian Stafford ; Martijn Verburg >>> ; Hohensee, Paul >>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >>> On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: >>>> Hi Vladimir and Tobias, >>>> >>>> >> Sure, here are four examples of EA and/or scalarization failing due to >>>> >> complicated control/data flow: >>>> >> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=rHICrWRQCY8clUEGTnIwK%2Bsvv9SPUp9o6tDjzfq1L9E%3D&reserved=0 >>>> >>> >>> >>>> >>>> >> There are 2 test files with small methods for different EA cases I used to >>>> >> see how EA works: >>>> >> >>>> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>>> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>>> >>>> Thank you for the examples, Tobias/Vladimir. This is being very helpful. >>>> >>>> >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent >>>> >> some time investigating possible solutions for it but "no cigar". May be we >>>> >> do indead need control flow analysis to resolve this. >>>> >>>> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My >>> >>> Yes. >>> >>> To clarify. I investigated solutions in current flow-insensitive EA. >>> >>>> first idea to handle these control/data-merge issues was to implement in C2 the >>>> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al >>>> PEA paper. Do you think this is reasonable? >>> >>> Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. >>> I proposed small RFEs just for warmup ;) >>> >>>> >>>> >> I am currently looking on iterative EA. Do more EA rounds if we can >>>> >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and >>>> >> I have working prototype. >>>> >>>> Cool! I'm curious, when do you plan to submit a Pull Request for this? >>> >>> I am investigating regressions in some benchmarks. >>> >>>> >>>> >> There is also suggestion from Amazon Java group about "C2 Partial Escape >>>> >> Analysis" which needs more discussion: >>>> >> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=f7ef8mHSQG0J0l8Yu47B5HVtPGqZSLL%2FUL67NKbZtdk%3D&reserved=0 >>>> >>> >>> >>>> >>>> I'd love to hear from them about their experience with these issues and if they >>>> have any plans to work on this moving forward! I'll ping them on the thread >>>> that you linked above. >>> >>> Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear >>> any additional information after Vladimir Ivanov replied. >>> >>> Regards, >>> Vladimir K >>> >>>> >>>> >>>> Regards, >>>> Cesar >>>> ------------------------------------------------------------------------------------------------------------------------ >>>> >>>> *From:* Vladimir Kozlov >>>> *Sent:* October 27, 2021 10:26 AM >>>> *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler >>>> >>>> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >>>> ; Brian Stafford ; Martijn Verburg >>>> >>>> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >>>> First. Thank you, Cesar, for collecting data about C2 EA shortcomings. >>>> >>>> I agree with cases Tobias pointed as possible starting points to improve EA. >>>> >>>> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for >>>> it but "no cigar". May be we do indead need control flow analysis to resolve this. >>>> >>>> I looked through JBS and found few issues which are not required to write new EA: >>>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Kyq1%2F4qtsI6uxsuC%2Fc1DVIWbC0zyUt958zXHKDdpKCs%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=XyVqpSZij5uAhbA80ceY37Iuy%2FpolRghPB2rP%2Fvh2fE%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gn%2Bcj8cMxrUJtGUuF7Vj63YQsFH3gDgeT9ld6f4K3m0%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hF1cSOvyaYdSXXEkydxDmC9BHzimxHw8sgIC0%2F9oOgA%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Tobias also has fix prototype for next bug which was not fixed yet: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865475983092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PE05yQySp9KYzzs9%2F4POVdwhh4%2FLJARyha1o3%2BLTljg%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Ther are 2 test files with small methods for different EA cases I used to see how EA works: >>>> >>>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>>> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >>>> >>>> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for >>>> known merge issue: >>>> >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865476033078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x1xAEXYyCuLP3fNGrCW7VOsDchFTOo2zp31teVn1FUY%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was >>>> proposed by Vladimir Ivanov and I have working prototype. >>>> >>>> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: >>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865476033078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=k7UjGtrhLYV94rrZ8TcyJXgwRjdfpEObSOaTTPgusoU%3D&reserved=0 >>>> >>> >>> >>> >>>> >>> >>> > >>> >>>> >>>> Thanks, >>>> Vladimir K >>>> >>>> On 10/27/21 3:04 AM, Tobias Hartmann wrote: >>>>> Hi Cesar, >>>>> >>>>> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>>>>> Right. I was suspecting this to be the most critical issue indeed. However, I >>>>>> didn't know there was a case where "... the object does not escape on any paths >>>>>> but control flow is too complicated for EA to prove that." Is this an issue >>>>>> tracked in JBS or perhaps you can show me an example where this happens? >>>>> >>>>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >>>>> flow: >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C18e36ce0df5c41d108d108d9b5075a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637739865476033078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FSgQ5PPx%2Fow4P7siqqP9Zwe4uQ0K9QfGKFtx2GCreKA%3D&reserved=0 >>>>> >>> >>> >>> >>>> >>> >>> > >>> >>>>> >>>>> All examples would completely fold with inline types (Valhalla). >>>>> >>>>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >>>>> of the issues you already described. >>>>> >>>>> Best regards, >>>>> Tobias >>>>> From kvn at openjdk.java.net Wed Dec 1 21:03:30 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Dec 2021 21:03:30 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> Message-ID: On Wed, 1 Dec 2021 09:13:36 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 6m37.855s >> user 56m23.004s >> sys 0m20.148s >> >> # x86_32 (TR 3970X) >> real 11m22.877s >> user 168m8.137s >> sys 5m7.037s >> >> # x86_64 (i5-11500) >> real 15m55.424s >> user 118m0.969s >> sys 0m12.039s >> >> # AArch64 (ThunderX2) >> real 4m5.177s >> user 32m7.295s >> sys 0m19.689s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice Good. Let me test it before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From sviswanathan at openjdk.java.net Wed Dec 1 22:10:29 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 1 Dec 2021 22:10:29 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v3] In-Reply-To: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> References: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> Message-ID: On Wed, 1 Dec 2021 02:56:06 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments The patch looks good to me. Please wait for Vladimir Kozlov's testing and approval. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6595 From jiefu at openjdk.java.net Wed Dec 1 23:22:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Dec 2021 23:22:28 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 18:41:33 GMT, Sandhya Viswanathan wrote: > Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems. Thanks for your clarification. But it still remains unknown why the 64-byte instructions shouldn't be used on CPUs which don't support `serialize`. I will test the 64-byte instructions on older AVX512 systems today and feedback here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From duke at openjdk.java.net Thu Dec 2 00:20:52 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 2 Dec 2021 00:20:52 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: References: Message-ID: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> > Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. > > 5986.947899319073 MB/s => 24041.05203089616 MB/s > 5840.02689336947 MB/s => 24898.781468710356 MB/s > > ********** Original *********** > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 1.710387358 seconds > CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds > CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > > > > *********** With my changes: ************* > > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 0.425938099 seconds > CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds > CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: - MICRO to MILLI as requested. - Fixing benchmark to throughput with default iterations. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6595/files - new: https://git.openjdk.java.net/jdk/pull/6595/files/92b4b9fc..906a57d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595 PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Thu Dec 2 00:34:27 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 2 Dec 2021 00:34:27 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> References: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> Message-ID: <8Bf2Ltih4gJC_tPCSzDfm-ANaovXOJhMJWP204N76D8=.1dab2f77-f62e-4041-8c60-d68532347bfa@github.com> On Thu, 2 Dec 2021 00:20:52 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - MICRO to MILLI as requested. > - Fixing benchmark to throughput with default iterations. Hi, Eric. Thanks for the suggestions. I?ve made the changes. Thanks, --Scott Gibbons Software Development Engineer, Runtime Engineering ***@***.*** DEVELOPER SOFTWARE ENGINEERING Ph: 1-503-456-7756 Cell: 1-469-450-8390 2501 NE Century Blvd Hillsboro, OR 97124 Intel Corporation | www.intel.com From: mlbridge[bot] ***@***.***> Sent: Wednesday, December 1, 2021 4:02 PM To: openjdk/jdk ***@***.***> Cc: Gibbons, Scott ***@***.***>; Mention ***@***.***> Subject: Re: [openjdk/jdk] 8277358: Accelerate CRC32-C (PR #6595) Mailing list message from eric.caspole at ***@***.***> on ***@***.***>: Hi Scott, Thanks for the JMH. I would like to use Mode.Throughput (i.e. 9368.786 ?? 96.956? ops/ms) so the scores are not very tiny numbers, and just use the default iterations so the runs are about 35 minutes instead of 1h30, what do you think? The iterations are very stable so the defaults are fine in my testing. Regards, Eric diff --git a/test/micro/org/openjdk/bench/java/util/TestCRC32C.java b/test/micro/org/openjdk/bench/java/util/TestCRC32C.java index 10681e19bbf..0c3b39fc59a 100644 --- a/test/micro/org/openjdk/bench/java/util/TestCRC32C.java +++ b/test/micro/org/openjdk/bench/java/util/TestCRC32C.java @@ -27,12 +27,10 @@ import java.util.concurrent.TimeUnit; ?import java.util.zip.CRC32C; ?import org.openjdk.jmh.annotations.*; - at BenchmarkMode(Mode.AverageTime) - at OutputTimeUnit(TimeUnit.MICROSECONDS) + at BenchmarkMode(Mode.Throughput) + at OutputTimeUnit(TimeUnit.MILLISECONDS) ***@***.******@***.***(Scope.Benchmark)> ***@***.***(value = 2) - at Warmup(iterations = 2, time = 30, timeUnit = TimeUnit.SECONDS) - at Measurement(iterations = 3, time = 60, timeUnit = TimeUnit.SECONDS) ?public class TestCRC32C { On 11/30/21 7:13 PM, Scott Gibbons wrote: ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From kvn at openjdk.java.net Thu Dec 2 00:58:32 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 00:58:32 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v11] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 11:09:55 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow > - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions > - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters > Rebased on top of '8275908: Record null_check traps for calls and array_check traps in the interpreter' > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Volker. What MDO (bytecodes and counters) looks like for your test case method (-XX:CompileCommand=print,ImplicitException.isAlphaWithException) ? src/hotspot/share/opto/graphKit.cpp line 627: > 625: const TypeKlassPtr *ex_type = TypeKlassPtr::make(ex_ciInstKlass); > 626: kill_dead_locals(); > 627: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); What happened if deoptimization happen during this allocation (which is safepoint)? Which bytecode will be executed in Interpeter after deopt? src/hotspot/share/opto/graphKit.cpp line 629: > 627: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); > 628: set_argument(0, ex_node); > 629: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make(""), ciSymbol::make("()V")); I know that all exceptions classes have such constructor but in general you need to check for `nullptr`. I think it could be moved before check at line 624. src/hotspot/share/opto/graphKit.cpp line 640: > 638: address target = SharedRuntime::get_resolve_opt_virtual_call_stub(); > 639: > 640: CallStaticJavaNode *call = new CallStaticJavaNode(kit.C, TypeFunc::make(init), target, init); At the end `()` will call native `fillInStackTrace()` and nothing else: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Throwable.java#L255 Should we optimize it by inlining it here so that EA can eliminate above Allocation if it does not escape? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From dholmes at openjdk.java.net Thu Dec 2 02:10:27 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Dec 2021 02:10:27 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: <7LuaRE7ggJLUwX-IWbm_HR4jzqTb02z9-jqDFcLtz5M=.634c995b-3616-4145-911c-f639aee68b21@github.com> On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From dholmes at openjdk.java.net Thu Dec 2 02:10:28 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Dec 2021 02:10:28 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: <72-LBIFmV82Z0GRieH-cxjyrub8Z-VVC4izGsW_bc4A=.1e021a98-3a4e-4ccc-87a7-c3e01c26d5e5@github.com> Message-ID: <1VQI2YTryJdLrkpNHAlnHJ9Mh6AlcQb0EoxQvVZkc2k=.6dff5cf5-7d24-4949-a245-e308bb5a2934@github.com> On Wed, 1 Dec 2021 20:56:26 GMT, Vladimir Kozlov wrote: >>> @shipilev again I think we need to examine this in terms of impact to our CI. We run different platforms and configurations in different tiers so the costs are not as simple as looking at one run. >> >> Again, I can wait for those who have more insight in Oracle testing pipelines check their workflows with this change. I have no insight in Oracle infra, so somebody else have to do it. Now that Igor left, who should we be talking to? > >> > @shipilev again I think we need to examine this in terms of impact to our CI. We run different platforms and configurations in different tiers so the costs are not as simple as looking at one run. >> >> Again, I can wait for those who have more insight in Oracle testing pipelines check their workflows with this change. I have no insight in Oracle infra, so somebody else have to do it. Now that Igor left, who should we be talking to? > > @dholmes-ora I checked and this change does not interfere with our CI. `tier2` and `tier3` introduced by #5241 are not used by our CI. New `tier2_compiler` and `tier3_compiler` groups are also not used. We use different sets in CI. I am not sure how else it can affect our testing. > > I also submitted our testing. I will let you know results. @vnkozlov thanks for that! I didn't realize the HS testing was so isolated from the jtreg group definitions. Thanks for your patience @shipilev . ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From jiefu at openjdk.java.net Thu Dec 2 02:32:29 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 2 Dec 2021 02:32:29 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: <3scFWQBWPLI0rgMH_D2n02JqJdXEGtlz_5s2Co6re40=.57cbe0a5-6f91-4ec5-958d-a3a5a3bddec8@github.com> On Wed, 1 Dec 2021 23:19:47 GMT, Jie Fu wrote: > > Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems. > > Thanks for your clarification. But it still remains unknown why the 64-byte instructions shouldn't be used on CPUs which don't support `serialize`. > > I will test the 64-byte instructions on older AVX512 systems today and feedback here. Here is the performance data on our older AVX512 platform which doesn't support `serialize`. Even without `serialize` , the performance has been improved with 64-byte instructions. E.g., for `ArrayCopy.arrayCopyObjectNonConst`, it has been improved by ~15%. So it seems unfair only enable 64-byte instructions for the latest Intel AVX512 platforms. Still, I would like to know why we don't use 64-byte instructions on platforms without `serialize` support. Thanks. --------------------------------------------------- Results with 32-byte instructions. ==> perf32-1.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 24.070 ? 0.013 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 27.517 ? 0.023 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.127 ? 0.008 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.934 ? 0.009 ns/op ==> perf32-2.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 24.511 ? 0.027 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 27.240 ? 0.034 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.065 ? 0.013 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.956 ? 0.161 ns/op ==> perf32-3.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 25.357 ? 0.006 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 27.513 ? 1.468 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.984 ? 0.024 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.945 ? 1.346 ns/op Results with 64-byte instructions. ==> perf64-1.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 23.425 ? 0.003 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 23.530 ? 0.002 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.174 ? 0.074 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 19.942 ? 0.134 ns/op ==> perf64-2.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 22.429 ? 0.012 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 25.189 ? 0.031 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.093 ? 0.004 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.400 ? 1.213 ns/op ==> perf64-3.log <== Benchmark Mode Cnt Score Error Units ArrayCopy.arrayCopyObject avgt 5 23.472 ? 0.002 ns/op ArrayCopy.arrayCopyObjectNonConst avgt 5 23.534 ? 0.031 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.232 ? 0.150 ns/op ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.921 ? 0.008 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From david.holmes at oracle.com Thu Dec 2 02:46:56 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 2 Dec 2021 12:46:56 +1000 Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> Message-ID: <6180c804-3396-774d-8cda-5c3900c8a0f4@oracle.com> On 1/12/2021 11:54 pm, Jie Fu wrote: > On Wed, 1 Dec 2021 12:41:14 GMT, David Holmes wrote: > >> From my previous questions on this it was indicated that any CPU that supports `serialize` has the improved performance. > > If so, CPUs that don't support `serialize` would behave as before. > Then there shouldn't be any performance regression. Yes, which is exactly why we have been saying this should not affect "old" CPUs. David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6512 > From david.holmes at oracle.com Thu Dec 2 02:51:46 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 2 Dec 2021 12:51:46 +1000 Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: <3scFWQBWPLI0rgMH_D2n02JqJdXEGtlz_5s2Co6re40=.57cbe0a5-6f91-4ec5-958d-a3a5a3bddec8@github.com> References: <3scFWQBWPLI0rgMH_D2n02JqJdXEGtlz_5s2Co6re40=.57cbe0a5-6f91-4ec5-958d-a3a5a3bddec8@github.com> Message-ID: <0bae5279-90e5-3f13-61ff-820684c049f9@oracle.com> On 2/12/2021 12:32 pm, Jie Fu wrote: > On Wed, 1 Dec 2021 23:19:47 GMT, Jie Fu wrote: > >>> Yes, the patch doesn't change behavior on AVX2 and older AVX512 systems. >> >> Thanks for your clarification. But it still remains unknown why the 64-byte instructions shouldn't be used on CPUs which don't support `serialize`. >> >> I will test the 64-byte instructions on older AVX512 systems today and feedback here. > > > Here is the performance data on our older AVX512 platform which doesn't support `serialize`. > > Even without `serialize` , the performance has been improved with 64-byte instructions. > E.g., for `ArrayCopy.arrayCopyObjectNonConst`, it has been improved by ~15%. > > So it seems unfair only enable 64-byte instructions for the latest Intel AVX512 platforms. > > Still, I would like to know why we don't use 64-byte instructions on platforms without `serialize` support. Because, as previously stated, there is no actual way to identify those CPUs. But we know that if they support serialize then they also support the faster 64-bit ops. But that doesn't means that if they don't support serialize that they don't support the faster 64-bit ops. So all that is available for choosing whether to use them or not is whether serialize is supported. David > Thanks. > > --------------------------------------------------- > > Results with 32-byte instructions. > > ==> perf32-1.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 24.070 ? 0.013 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 27.517 ? 0.023 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.127 ? 0.008 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.934 ? 0.009 ns/op > > ==> perf32-2.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 24.511 ? 0.027 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 27.240 ? 0.034 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 21.065 ? 0.013 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.956 ? 0.161 ns/op > > ==> perf32-3.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 25.357 ? 0.006 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 27.513 ? 1.468 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.984 ? 0.024 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.945 ? 1.346 ns/op > > > > Results with 64-byte instructions. > > ==> perf64-1.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 23.425 ? 0.003 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 23.530 ? 0.002 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.174 ? 0.074 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 19.942 ? 0.134 ns/op > > ==> perf64-2.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 22.429 ? 0.012 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 25.189 ? 0.031 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.093 ? 0.004 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 20.400 ? 1.213 ns/op > > ==> perf64-3.log <== > Benchmark Mode Cnt Score Error Units > ArrayCopy.arrayCopyObject avgt 5 23.472 ? 0.002 ns/op > ArrayCopy.arrayCopyObjectNonConst avgt 5 23.534 ? 0.031 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward avgt 5 20.232 ? 0.150 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward avgt 5 21.921 ? 0.008 ns/op > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6512 > From jiefu at openjdk.java.net Thu Dec 2 03:38:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 2 Dec 2021 03:38:28 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace LGTM ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Thu Dec 2 03:38:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 2 Dec 2021 03:38:28 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <0bae5279-90e5-3f13-61ff-820684c049f9@oracle.com> References: <0bae5279-90e5-3f13-61ff-820684c049f9@oracle.com> Message-ID: On Thu, 2 Dec 2021 02:53:52 GMT, David Holmes wrote: > Because, as previously stated, there is no actual way to identify those > CPUs. But we know that if they support serialize then they also support > the faster 64-bit ops. But that doesn't means that if they don't support > serialize that they don't support the faster 64-bit ops. So all that is > available for choosing whether to use them or not is whether serialize > is supported. OK, make sense! Since it won't make things worse for the "old" systems, I'm fine with it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From stuefe at openjdk.java.net Thu Dec 2 05:12:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 05:12:29 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Wed, 1 Dec 2021 16:00:09 GMT, Zhengyu Gu wrote: > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on Very nice simplification. Before diving in, could you explain a bit the locking changes in MallocSiteTable? To me, the association with shutdown is not immediately clear. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From kvn at openjdk.java.net Thu Dec 2 07:32:29 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 07:32:29 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: <_tpNEJ7RnKYConrel3BdMSsn7DRHOZX1i1lqX3yVZD8=.82735e32-7713-4d6f-8142-9bba01cbee48@github.com> On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too I don't see issues with these changes in my testing. I submitted our tier1,2,3 testing in internal infra. ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From neliasso at openjdk.java.net Thu Dec 2 09:11:21 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 2 Dec 2021 09:11:21 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6628 From duke at openjdk.java.net Thu Dec 2 09:20:53 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 2 Dec 2021 09:20:53 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v8] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Fix up UseROPProtection flag ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/280abc41..995d8aa3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=06-07 Stats: 18 lines in 1 file changed: 10 ins; 5 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Dec 2 09:20:57 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 2 Dec 2021 09:20:57 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v7] In-Reply-To: References: Message-ID: <5IotnN-721orMatHakGGJ0sIMpiWb506D-mrK5aFMCI=.45757405-3547-498e-b280-3e5f42e9eda6@github.com> On Mon, 22 Nov 2021 17:35:41 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge master > - Merge master > - Rename pauth_authenticate_or_strip_return_address > - Fix windows aarch64 by restoring pauth file split > - Don't keep LR live across restore_live_registers > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - ... and 3 more: https://git.openjdk.java.net/jdk/compare/ca31ed53...280abc41 @dholmes-ora Fixed flags based on comments on the CSR: > However, the proposed implementation does not match the description. VM_Version::initialize() is called after argument processing so even if the user has explicitly set the flag to false, if this is a PAC enabled system it will be turned back on. The code needs to check if the flag has been set before overriding the value; further the warning at line 389 should only be given if the user actually turned the flag on. But this should be taken up in the PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From phedlin at openjdk.java.net Thu Dec 2 09:28:30 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Thu, 2 Dec 2021 09:28:30 GMT Subject: RFR: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. Thank you for reviewing @aph and @neliasso. Thank you for contributing the patch @luhenry. Thank you for commenting on the issue @cl4es. ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From phedlin at openjdk.java.net Thu Dec 2 09:28:31 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Thu, 2 Dec 2021 09:28:31 GMT Subject: Integrated: 8251216: Implement MD5 intrinsics on AArch64 In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:24:45 GMT, Patric Hedlin wrote: > Implementation of MD5 intrinsic support for AArch64. > > Contributed by Ludovic Henry (@luhenry). > > Speedup measured (in Aurora running Ampere Altra) as follows: > > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1048576-provider:...29.39% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2047-provider:.........28.91% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:2048-provider:.........28.81% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1023-provider:.........28.43% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:1024-provider:.........28.32% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:511-provider:...........27.78% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:512-provider:...........27.62% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:255-provider:...........26.52% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:256-provider:...........26.38% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:127-provider:...........25.41% > openjdk.bench.javax.crypto.full.MessageDigestBench.digest-algorithm:MD5-dataSize:128-provider:...........24.66% > > Testing tier1-7. This pull request has now been integrated. Changeset: 088b244e Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/088b244ec6d9393a1fcd2233fa5b4cf46f9ae0dd Stats: 199 lines in 4 files changed: 193 ins; 1 del; 5 mod 8251216: Implement MD5 intrinsics on AArch64 Co-authored-by: Ludovic Henry Reviewed-by: aph, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6628 From shade at openjdk.java.net Thu Dec 2 09:35:29 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Dec 2021 09:35:29 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:26:04 GMT, Andrew Haley wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Fix a comment >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - More reviews >> - Review feedback >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Initial work: runs async-profiler successfully > > src/hotspot/cpu/zero/frame_zero.cpp line 139: > >> 137: assert(is_interpreted_frame(), "Not an interpreted frame"); >> 138: // These are reasonable sanity checks >> 139: if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) { > > Use `is_aligned()` here? @theRealAph Do you agree with above? Any more comments? ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From aph at openjdk.java.net Thu Dec 2 10:08:25 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 2 Dec 2021 10:08:25 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 16:00:30 GMT, Erik Gahlin wrote: > I ran some benchmarks as well: http://cr.openjdk.java.net/~jvernee/UnsafeTest.java > > I see about a 6 ns increase in benchmark times with the new coded added in (regardless of allocation size), which sounds about right. An unsafe allocation and free takes about 90 ns on my machine with the latest JDK, so the regression is ~6%. (I'm not sure if that's worth worrying about, see below). Computers can get a lot done in 6ns. Why can't the JFR event be conditional on a simple flag? ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From stuefe at openjdk.java.net Thu Dec 2 10:47:24 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 10:47:24 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:04:44 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr > When the framework has a unsafe memory leak , there is no way to know what code is causing it. Just a note, this is tracked by NMT (category `mtOther`). > Computers can get a lot done in 6ns. Why can't the JFR event be conditional on a simple flag? Yes, I dislike paying for something I don't use. We have seen customers using these APIs a lot, so it can be a hot path. The fact that malloc is used is an implementation detail not known to everyone. @egahlin > I also wonder if the events should be called NativeAllocation, NativeReallocation and NativeFree, so they are not tied so hard to the Unsafe implementation. But you would still limit this to allocations done by outside code, right? Otherwise, again, we have NMT for tracking native memory in the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From egahlin at openjdk.java.net Thu Dec 2 10:57:24 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Thu, 2 Dec 2021 10:57:24 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 10:05:09 GMT, Andrew Haley wrote: > > I ran some benchmarks as well: http://cr.openjdk.java.net/~jvernee/UnsafeTest.java > > I see about a 6 ns increase in benchmark times with the new coded added in (regardless of allocation size), which sounds about right. An unsafe allocation and free takes about 90 ns on my machine with the latest JDK, so the regression is ~6%. (I'm not sure if that's worth worrying about, see below). > > Computers can get a lot done in 6ns. Why can't the JFR event be conditional on a simple flag? If they are written like this, it should be a simple check: EventUnsafeReallocate event; if (event.should_commit()) { event.set_allocationSize(sz); event.set_freeAddr(addr); event.set_allocAddr(reallocAddr); event.commit() } ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From egahlin at openjdk.java.net Thu Dec 2 10:57:24 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Thu, 2 Dec 2021 10:57:24 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: <_kShMx8mpE3yg1j0VqoO46RtWgJwTJ3bclOAkPwDmuc=.8ecfdca0-df04-45b2-9fe6-b3369b8dae52@github.com> On Thu, 2 Dec 2021 10:43:56 GMT, Thomas Stuefe wrote: > > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > > Just a note, this is tracked by NMT (category `mtOther`). > > > Computers can get a lot done in 6ns. Why can't the JFR event be conditional on a simple flag? > > Yes, I dislike paying for something I don't use. We have seen customers using these APIs a lot, so it can be a hot path. The fact that malloc is used is an implementation detail not known to everyone. > > @egahlin > > > I also wonder if the events should be called NativeAllocation, NativeReallocation and NativeFree, so they are not tied so hard to the Unsafe implementation. > > But you would still limit this to allocations done by outside code, right? Otherwise, again, we have NMT for tracking native memory in the VM. Yes, NativeAllocation could be misleading, perhaps JavaNativeAllocation? There is already JavaExceptionThrow ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From dholmes at openjdk.java.net Thu Dec 2 12:24:24 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Dec 2021 12:24:24 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v7] In-Reply-To: <5IotnN-721orMatHakGGJ0sIMpiWb506D-mrK5aFMCI=.45757405-3547-498e-b280-3e5f42e9eda6@github.com> References: <5IotnN-721orMatHakGGJ0sIMpiWb506D-mrK5aFMCI=.45757405-3547-498e-b280-3e5f42e9eda6@github.com> Message-ID: On Thu, 2 Dec 2021 09:16:59 GMT, Alan Hayward wrote: > @dholmes-ora > > Fixed flags based on comments on the CSR: Flag updates look good - thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From shade at openjdk.java.net Thu Dec 2 12:57:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Dec 2021 12:57:38 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl Message-ID: SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. Additional testing: - [x] Linux x86_64 fastdebug build - [ ] GHA ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6671/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6671&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278143 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6671.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6671/head:pull/6671 PR: https://git.openjdk.java.net/jdk/pull/6671 From lfoltan at openjdk.java.net Thu Dec 2 13:40:24 2021 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Thu, 2 Dec 2021 13:40:24 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 12:49:45 GMT, Aleksey Shipilev wrote: > SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. > > Additional testing: > - [x] Linux x86_64 fastdebug build > - [ ] GHA Looks good. Lois Marked as reviewed by lfoltan (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6671 From zgu at openjdk.java.net Thu Dec 2 13:42:28 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 13:42:28 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 05:09:04 GMT, Thomas Stuefe wrote: > Very nice simplification. > > Before diving in, could you explain a bit the locking changes in MallocSiteTable? To me, the association with shutdown is not immediately clear. Thanks! You are talking about `AccessLock`, right? As @dholmes-ora mentioned in [PR #6267](https://github.com/openjdk/jdk/pull/6267), the name is misleading: it is **not** a lock, but a countdown latch. It allows multi-reader to access `MallocSiteTable`, but once an exclusive access is requested, the requester sets counter to negative number and waits all readers to exit, then no readers and writers are allowed, so that `MallocSiteTable` can be safely destroyed. `AccessLock` was invented to guard `MallocSiteTable`, as `ThreadCritical` is **too** expensive for malloc tracking. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From ecaspole at openjdk.java.net Thu Dec 2 14:33:23 2021 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Thu, 2 Dec 2021 14:33:23 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> References: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> Message-ID: On Thu, 2 Dec 2021 00:20:52 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - MICRO to MILLI as requested. > - Fixing benchmark to throughput with default iterations. The JMH part looks good. Thanks, Eric ------------- Marked as reviewed by ecaspole (Committer). PR: https://git.openjdk.java.net/jdk/pull/6595 From dcubed at openjdk.java.net Thu Dec 2 15:24:23 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 2 Dec 2021 15:24:23 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: <6KJgAcgyQwuXZkGBBQBA8mBtzbFhu4G3mQMrlhoLGSA=.9ce90496-9d90-46eb-86db-efaeed46b44c@github.com> On Wed, 1 Dec 2021 16:00:09 GMT, Zhengyu Gu wrote: > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on Hmmm... sounds like a ReadersWriter lock to me... Any number of Readers and a single Writer after all the Readers have released... It has surprised me before that HotSpot does not have such a useful mechanism... ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From stuefe at openjdk.java.net Thu Dec 2 15:49:28 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 15:49:28 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Wed, 1 Dec 2021 16:00:09 GMT, Zhengyu Gu wrote: > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on > You are talking about AccessLock, right? > As @dholmes-ora mentioned in PR #6267, the name is misleading: it is not a lock, but a countdown latch. It allows multi-reader to access MallocSiteTable, but once an exclusive access is requested, the requester sets counter to negative number and waits all readers to exit, then no readers and writers are allowed, so that MallocSiteTable can be safely destroyed. > AccessLock was invented to guard MallocSiteTable, as ThreadCritical is too expensive for malloc tracking. Just to see I get it right, AccessLock is not needed anymore because we don't destroy the MallocSiteTable? We don't need it to guard the table when we add callstacks? ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From zgu at openjdk.java.net Thu Dec 2 15:49:28 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 15:49:28 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: <6KJgAcgyQwuXZkGBBQBA8mBtzbFhu4G3mQMrlhoLGSA=.9ce90496-9d90-46eb-86db-efaeed46b44c@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <6KJgAcgyQwuXZkGBBQBA8mBtzbFhu4G3mQMrlhoLGSA=.9ce90496-9d90-46eb-86db-efaeed46b44c@github.com> Message-ID: On Thu, 2 Dec 2021 15:21:13 GMT, Daniel D. Daugherty wrote: > Hmmm... sounds like a ReadersWriter lock to me... Any number of Readers and a single Writer after all the Readers have released... It has surprised me before that HotSpot does not have such a useful mechanism... Well, the difference is that, it does not have well defined critical section. It depends on other structures/operations for data integrity. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From zgu at openjdk.java.net Thu Dec 2 15:55:25 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 15:55:25 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 15:43:58 GMT, Thomas Stuefe wrote: > > You are talking about AccessLock, right? > > > As @dholmes-ora mentioned in PR #6267, the name is misleading: it is not a lock, but a countdown latch. It allows multi-reader to access MallocSiteTable, but once an exclusive access is requested, the requester sets counter to negative number and waits all readers to exit, then no readers and writers are allowed, so that MallocSiteTable can be safely destroyed. > > > AccessLock was invented to guard MallocSiteTable, as ThreadCritical is too expensive for malloc tracking. > > Just to see I get it right, AccessLock is not needed anymore because we don't destroy the MallocSiteTable? We don't need it to guard the table when we add callstacks? Yes. Ironically, adding entry only requires shared access, it uses `CAS` to ensure table integrity :-) If we don't destroy the table, there is no requester for exclusive access, so it is no longer needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From stuefe at openjdk.java.net Thu Dec 2 16:09:24 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 16:09:24 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 15:52:24 GMT, Zhengyu Gu wrote: > Yes. Ironically, adding entry only requires shared access, it uses `CAS` to ensure table integrity :-) If we don't destroy the table, there is no requester for exclusive access, so it is no longer needed. I see. That's nice. Thanks for the clarification. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From stuefe at openjdk.java.net Thu Dec 2 16:26:25 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 16:26:25 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Wed, 1 Dec 2021 16:00:09 GMT, Zhengyu Gu wrote: > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on Nice cleanup. You missed a few: - test/lib/jdk/test/whitebox/WhiteBox.java, `NMTChangeTrackingLevel` - test/lib/sun/hotspot/WhiteBox.java, same Will the removal of WB functions cause incompatibilities with older versions of jtreg (can't imagine it does, just wondering)? Cheers, Thomas src/hotspot/os/posix/perfMemory_posix.cpp line 1031: > 1029: static void unmap_shared(char* addr, size_t bytes) { > 1030: int res; > 1031: if (MemTracker::tracking_level() >= NMT_summary) { Just an idle thought, maybe we could add a `MemTracker::is_enabled()` as a shortcut for level > off. src/hotspot/share/services/nmtCommon.hpp line 46: > 44: // "summary": after initialization with NativeMemoryTracking=summary - NMT in summary mode > 45: // - category summaries per tag are tracked > 46: // - thread stacks are tracked Unrelated to your patch, while looking at this I notice that my old comment about "summary" was wrong: with "summary", the malloc site table is not allocated and does not get used. Since you are here, could you fix the comment? Thanks! src/hotspot/share/services/nmtCommon.hpp line 67: > 65: NMT_off, > 66: NMT_summary, > 67: NMT_detail I'm fine with removing the explicit numbers here, but could you please add STATIC_ASSERT(off > unknown) STATIC_ASSERT(summary > off) STATIC_ASSERT(detail > summary) somewhere? (Maybe in the cpp file, its not such an eyesore there). ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6640 From eastig at amazon.co.uk Thu Dec 2 17:16:26 2021 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Thu, 2 Dec 2021 17:16:26 +0000 Subject: [External] : RFC: improving NMethod code locality in CodeCache Message-ID: <22137786-0C8C-4C20-A7F7-89D9107C4D90@amazon.com> Hi Tobias, Thank you for your comments and references.? > Is it really a problem with branch prediction or more with instruction caching? This is a problem with dynamic branch prediction. For example, to improve branch prediction on Graviton2 we recommend to disable tiered compilation and to restrict the size of the code cache (see https://github.com/aws/aws-graviton-getting-started/blob/main/java.md). It has shown large (1.5x) improvements in some Java workloads. > (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Code placement also affects BTB which is part of dynamic branch prediction. See: https://arxiv.org/pdf/1804.00261.pdf "A Survey of Techniques for Dynamic Branch Prediction" https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.2910&rep=rep1&type=pdf "The Effect of Code Reordering on Branch Prediction" > Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? I don't have numbers for DaCapo and Renaissance benchmarks. We found the branch prediction problem with CodeCache in our Java workloads by analysing hardware performance counters and creating a dynamic map of CodeCache hot regions. 80% - 90% of hot code was C2 nmethods. The map of hot regions showed the code of those C2 nmethods was sparse. The analysis also showed nmethods had high ratio non-executable data (aka metadata) vs code. DaCapo and Renaissance benchmarks confirmed this was not specific to our Java workloads. > There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. Yes, this is one of challenges, maybe the biggest one. > > 2. Where to put: > > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > > c. Or in a completely different place (C-heap, Metaspace,...) > > It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality > between different nmethods. We want to improve code locality between different methods. For example, CodeCache has nmethods sequentially: A,B,C,D,E,F,G. There are two hottest call chains: A->E->G and C->G. We can relocate A, C, E and G close to each other to help BTB: A, C, E, G, B, D, F. The problem is that we have to do this each time hottest call chains/graphs changes. Removing metadata, code of all nmethods A - G can be covered by BTB. No need for relocations. In theory, at some point, e.g. 50000 and more c2 nmethods, this will stop working. We'll need to do relocations. BTW, I tried to relocate NMethod. It is not easy to implement because being relocated is a part of NMethod design. Once a nmethod gets into CodeCache it stays at the same place till it dies. > Solution b) would only improve code locality in the same nmethod but the overall layout of > executable code in the code cache would still be sparse. Why would b) not improve code locality of all methods? If methods use 64Mb we will have 32Mb of metadata at the beginning and 32Mb of code at the ending of a segment. The code of the methods would be close to each other. > I think c) would be the ideal solution: The code cache would only contain executable code and all > the metadata would be somewhere else. But solution a) would lead to the same layout and might be > easier to implement. I think b) might be easier to implement. I need to go to some low-level design to estimate complexity. I've created a review of a design document: https://github.com/eastig/codecache/pull/1. I'll be updating it according with email discussion. Thanks, Evgeny ?On 29/11/2021, 09:02, "Tobias Hartmann" wrote: Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://openjdk.java.net/jeps/197), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf http://cr.openjdk.java.net/~thartmann/papers/2014-PPPJ-Efficient_Code_Cache_Management.pdf > There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From sviswanathan at openjdk.java.net Thu Dec 2 17:46:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 2 Dec 2021 17:46:17 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: <7At_ag4fpFbiKQ81BsVoM9MzmXJJCIPufYSoi1WgIpg=.e2085ff5-34d8-4c7a-a561-9b36100f14a4@github.com> <869rFsjOuFR2Yt9bfmz45l4s6uRaFQc5IkPW3eJyZ8E=.e1a6695f-258d-4ec6-a28a-9fcba5e53ce9@github.com> Message-ID: On Wed, 1 Dec 2021 12:41:14 GMT, David Holmes wrote: >>> As I understand it - old AVX512 platforms will continue to work as before. >> >> According to @sviswa7 's comments (no cupid bit for the latest ISA), `is_intel_family_core() && supports_serialize()` can't distinguish all the old AVX512 platforms from the latest ones. >> So I think it may be possible some old AVX512 machines will behave differently after this opt. >> >> @sviswa7 , can you further explain what's the difference of the 64-byte instructions between Intel's old and latest AVX512 platforms? >> Why can't we enable them as default on old platforms? >> Thanks. > >> > As I understand it - old AVX512 platforms will continue to work as before. >> >> According to @sviswa7 's comments (no cupid bit for the latest ISA), `is_intel_family_core() && supports_serialize()` can't distinguish all the old AVX512 platforms from the latest ones. So I think it may be possible some old AVX512 machines will behave differently after this opt. > > I do not see such comments. From my previous questions on this it was indicated that any CPU that supports `serialize` has the improved performance. @dholmes-ora @DamonFool @jatin-bhateja @neliasso Thanks a lot for the review. If no further objections, I plan to integrate this PR tomorrow (Friday 12/3). ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From zgu at openjdk.java.net Thu Dec 2 18:13:17 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 18:13:17 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 16:09:31 GMT, Thomas Stuefe wrote: >> NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. >> >> After JDK-8277946, there is no longer a use, so it can be removed. >> >> Test: >> - [x] hotspot_nmt >> - [x] tier1 with NMT on > > src/hotspot/os/posix/perfMemory_posix.cpp line 1031: > >> 1029: static void unmap_shared(char* addr, size_t bytes) { >> 1030: int res; >> 1031: if (MemTracker::tracking_level() >= NMT_summary) { > > Just an idle thought, maybe we could add a `MemTracker::is_enabled()` as a shortcut for level > off. I was thinking about doing it in followup, yea, why not here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From neliasso at openjdk.java.net Thu Dec 2 18:13:30 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 2 Dec 2021 18:13:30 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6512 From zgu at openjdk.java.net Thu Dec 2 18:23:07 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 18:23:07 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: tstuefe's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6640/files - new: https://git.openjdk.java.net/jdk/pull/6640/files/8a5bb2ee..58d22421 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6640&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6640&range=00-01 Stats: 34 lines in 10 files changed: 9 ins; 4 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6640.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6640/head:pull/6640 PR: https://git.openjdk.java.net/jdk/pull/6640 From zgu at openjdk.java.net Thu Dec 2 18:23:07 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 2 Dec 2021 18:23:07 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 16:23:32 GMT, Thomas Stuefe wrote: > Nice cleanup. > > You missed a few: > > * test/lib/jdk/test/whitebox/WhiteBox.java, `NMTChangeTrackingLevel` > * test/lib/sun/hotspot/WhiteBox.java, same Out of curious, why have two versions? > > Will the removal of WB functions cause incompatibilities with older versions of jtreg (can't imagine it does, just wondering)? > No idea. > Cheers, Thomas > src/hotspot/share/services/nmtCommon.hpp line 67: > >> 65: NMT_off, >> 66: NMT_summary, >> 67: NMT_detail > > I'm fine with removing the explicit numbers here, but could you please add > > STATIC_ASSERT(off > unknown) > STATIC_ASSERT(summary > off) > STATIC_ASSERT(detail > summary) > > somewhere? (Maybe in the cpp file, its not such an eyesore there). Good idea, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From kvn at openjdk.java.net Thu Dec 2 18:24:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 18:24:15 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> Message-ID: <0XuwXvto0gl1o6RkEJ3Ui_4Jah0LzBfuFX13TUz6-Ow=.0347d85f-392b-4b7c-a7c3-65ff5573339b@github.com> On Wed, 1 Dec 2021 09:13:36 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 6m37.855s >> user 56m23.004s >> sys 0m20.148s >> >> # x86_32 (TR 3970X) >> real 11m22.877s >> user 168m8.137s >> sys 5m7.037s >> >> # x86_64 (i5-11500) >> real 15m55.424s >> user 118m0.969s >> sys 0m12.039s >> >> # AArch64 (ThunderX2) >> real 4m5.177s >> user 32m7.295s >> sys 0m19.689s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice Most testing passed fine. I am still waiting results on linux-aarch64. But I got 1 timeout failure running next test on Windows-x64-debug with -XX:+UseZGC: `compiler/arraycopy/stress/TestStressObjectArrayCopy.java` reason: User specified action: run main/othervm/timeout=960 -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI StressArrayCopyDriver TestStressObjectArrayCopy Timeout information: elapsed time (seconds): 3920.744 One flags combination run can take up to 8 min and you have 20 of them: [2021-12-02T16:33:27.024914600Z] Waiting for completion for process 9764 [2021-12-02T16:41:27.136342300Z] Waiting for completion finished for process 9764 Windows VM image has 12 cores, 46Gb memory and AMD latest CPU. I don't think it HW is issue. But it could be something with OS at that time. On linux-x64-debug it took: `main: 1411.585 seconds` running with ZGC. Still more then your specified timeout `timeout=960` Please, check it. You can increase timeout or split testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Thu Dec 2 18:32:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Dec 2021 18:32:37 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause Message-ID: Our support engineers asked this: > I see these G1Concurrent safepoints in JDK17: > [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns > I've always thought that "concurrent" and "safepoint" are basically antonyms. > What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms Additional testing: - [x] Linux x86_64 fastdebug `tier1` ------------- Commit messages: - Whitespace and touchups - Basic implementation Changes: https://git.openjdk.java.net/jdk/pull/6677/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6677&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278146 Stats: 69 lines in 4 files changed: 33 ins; 27 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/6677.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6677/head:pull/6677 PR: https://git.openjdk.java.net/jdk/pull/6677 From shade at openjdk.java.net Thu Dec 2 18:33:23 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Dec 2021 18:33:23 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <0XuwXvto0gl1o6RkEJ3Ui_4Jah0LzBfuFX13TUz6-Ow=.0347d85f-392b-4b7c-a7c3-65ff5573339b@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> <0XuwXvto0gl1o6RkEJ3Ui_4Jah0LzBfuFX13TUz6-Ow=.0347d85f-392b-4b7c-a7c3-65ff5573339b@github.com> Message-ID: On Thu, 2 Dec 2021 18:20:43 GMT, Vladimir Kozlov wrote: > Please, check it. You can increase timeout or split testing. Yes, thank you, that's exactly why I wanted to do these tests ahead of the actual arraycopy changes. I'll take a look at what can be done. FWIW, my Windows VM running on TR 3970X passed these tests in reasonable time, maybe it is Windows+ZGC-specific problem here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From stuefe at openjdk.java.net Thu Dec 2 18:33:25 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 18:33:25 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> Message-ID: On Thu, 2 Dec 2021 18:23:07 GMT, Zhengyu Gu wrote: >> NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. >> >> After JDK-8277946, there is no longer a use, so it can be removed. >> >> Test: >> - [x] hotspot_nmt >> - [x] tier1 with NMT on > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > tstuefe's comments Looks good to me. Thank you for doing this! ..Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6640 From stuefe at openjdk.java.net Thu Dec 2 18:33:25 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 2 Dec 2021 18:33:25 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On Thu, 2 Dec 2021 18:16:20 GMT, Zhengyu Gu wrote: > > Nice cleanup. > > You missed a few: > > > > * test/lib/jdk/test/whitebox/WhiteBox.java, `NMTChangeTrackingLevel` > > * test/lib/sun/hotspot/WhiteBox.java, same > > Out of curious, why have two versions? No clue :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From kvn at openjdk.java.net Thu Dec 2 18:49:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 18:49:16 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> Message-ID: <0-D8zfmV85KlFeIiWrC5LIb9Wu7Pj_VABLI-UuPS0t4=.d22226ad-54a7-4d48-bba1-7eabd6ed2c1c@github.com> On Wed, 1 Dec 2021 09:13:36 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 6m37.855s >> user 56m23.004s >> sys 0m20.148s >> >> # x86_32 (TR 3970X) >> real 11m22.877s >> user 168m8.137s >> sys 5m7.037s >> >> # x86_64 (i5-11500) >> real 15m55.424s >> user 118m0.969s >> sys 0m12.039s >> >> # AArch64 (ThunderX2) >> real 4m5.177s >> user 32m7.295s >> sys 0m19.689s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice Yes, ZGC is definitely affecting it. With ParallelGC on linux-x64 the time was: `main: 551.014 seconds` ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Thu Dec 2 19:48:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Dec 2021 19:48:22 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <0-D8zfmV85KlFeIiWrC5LIb9Wu7Pj_VABLI-UuPS0t4=.d22226ad-54a7-4d48-bba1-7eabd6ed2c1c@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> <0-D8zfmV85KlFeIiWrC5LIb9Wu7Pj_VABLI-UuPS0t4=.d22226ad-54a7-4d48-bba1-7eabd6ed2c1c@github.com> Message-ID: On Thu, 2 Dec 2021 18:46:02 GMT, Vladimir Kozlov wrote: > Yes, ZGC is definitely affecting it. With ParallelGC on linux-x64 the time was: `main: 551.014 seconds` I suspect arraycopy GC barriers, because Shenandoah is also quite a bit slow. In exhaustive tests for small arrays, runtime calls probably dominate. I'll figure it out. # UseParallelGC real 6m32.100s user 54m19.617s sys 0m36.336s # UseG1GC real 6m31.220s user 55m50.315s sys 0m19.156s # UseSerialGC real 5m53.627s user 52m59.540s sys 0m29.012s # UseShenandoahGC real 11m1.101s user 65m26.868s sys 0m34.472s # UseZGC real 15m15.289s user 73m15.533s sys 0m31.396s ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From duke at openjdk.java.net Thu Dec 2 19:49:16 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 2 Dec 2021 19:49:16 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v3] In-Reply-To: References: <_uLnNO847-foX9jDiBc2nSNfiwhgl5pV-0WV5NXViZg=.4c73d6cb-966a-42f2-aade-87edfc69a316@github.com> Message-ID: On Wed, 1 Dec 2021 20:27:19 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing review comments > > Nice work. Let me test it before approval. @vnkozlov Please let me know if the test passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From kvn at openjdk.java.net Thu Dec 2 20:02:17 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 20:02:17 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> References: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> Message-ID: On Thu, 2 Dec 2021 00:20:52 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - MICRO to MILLI as requested. > - Fixing benchmark to throughput with default iterations. Testing finally passed (at least on x86, aarch64 is still running). I takes long time to get results from several tiers. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6595 From sviswanathan at openjdk.java.net Thu Dec 2 20:09:21 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 2 Dec 2021 20:09:21 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: References: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> Message-ID: On Thu, 2 Dec 2021 19:58:52 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: >> >> - MICRO to MILLI as requested. >> - Fixing benchmark to throughput with default iterations. > > Testing finally passed (at least on x86, aarch64 is still running). I takes long time to get results from several tiers. Thanks a lot @vnkozlov. ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Thu Dec 2 20:09:21 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 2 Dec 2021 20:09:21 GMT Subject: RFR: 8277358: Accelerate CRC32-C [v4] In-Reply-To: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> References: <41Q1gB0N_YPjDB9nr4l74d2o0QKaYKsjC60s9xg4nk8=.45d4275e-fd64-44aa-8adf-13e480612ea1@github.com> Message-ID: On Thu, 2 Dec 2021 00:20:52 GMT, Scott Gibbons wrote: >> Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. >> >> 5986.947899319073 MB/s => 24041.05203089616 MB/s >> 5840.02689336947 MB/s => 24898.781468710356 MB/s >> >> ********** Original *********** >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 1.710387358 seconds >> CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds >> CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> >> >> >> >> *********** With my changes: ************* >> >> >> >> scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 >> offset = 0 >> msgSize = 512 bytes >> iters = 20000000 >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(byte[]) runtime = 0.425938099 seconds >> CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds >> CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s >> CRCs: crc = ae10ee5a, crcReference = ae10ee5a >> ------------------------------------------------------- > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - MICRO to MILLI as requested. > - Fixing benchmark to throughput with default iterations. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Thu Dec 2 20:09:23 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 2 Dec 2021 20:09:23 GMT Subject: Integrated: 8277358: Accelerate CRC32-C In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 14:45:22 GMT, Scott Gibbons wrote: > Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. > > 5986.947899319073 MB/s => 24041.05203089616 MB/s > 5840.02689336947 MB/s => 24898.781468710356 MB/s > > ********** Original *********** > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 1.710387358 seconds > CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds > CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > > > > *********** With my changes: ************* > > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 0.425938099 seconds > CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds > CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- This pull request has now been integrated. Changeset: e0f1fc78 Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/e0f1fc783cb492dd1eb18f2d56c57bdc160a410d Stats: 126 lines in 5 files changed: 98 ins; 4 del; 24 mod 8277358: Accelerate CRC32-C Co-authored-by: Greg Tucker Co-authored-by: Scott Gibbons Reviewed-by: kvn, sviswanathan, ecaspole ------------- PR: https://git.openjdk.java.net/jdk/pull/6595 From duke at openjdk.java.net Thu Dec 2 20:32:53 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Thu, 2 Dec 2021 20:32:53 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v5] In-Reply-To: References: Message-ID: <27FQ_IrnviiozDCog_Cn1wP6Wew-4yC9kHn41mgbUgY=.81500445-cd93-4290-809e-633e646a1217@github.com> > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: Rename BOTConstants and ObjectStartArray members to _card_* ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6570/files - new: https://git.openjdk.java.net/jdk/pull/6570/files/48828873..168e34e0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=03-04 Stats: 114 lines in 13 files changed: 0 ins; 0 del; 114 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From eastig at amazon.co.uk Thu Dec 2 22:07:01 2021 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Thu, 2 Dec 2021 22:07:01 +0000 Subject: RFC: improving NMethod code locality in CodeCache Message-ID: Hi Lutz, Thank you for your comments. From the data I've got NMethod constants section does not take a lot of space: total of them is below 0.6%. This is similar for both x86_64 and arm64. I guess it should be similar for other architectures. A decision to move stub code out depends on the stub code contribution. For x86_64 it is below 2%. So it might be kept with the main code. On arm64 it is currently up to 12%. I found couple issues in generated stub code. Resolving them should reduce arm64 stub code size. I am not sure it would possible to get arm64 stub code below 2%. > When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. All CPUs I have worked with don't like self modifying code and mixing code with modifiable data. An exception is literal pools holding constants embedded into code. > Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. In "A Survey of Techniques for Dynamic Branch Prediction"(https://arxiv.org/pdf/1804.00261.pdf) I found that a distance between branches can be taken into account. I've seen this. Thanks, Evgeny ?On 29/11/2021, 12:20, "Schmidt, Lutz" wrote: Hi, a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best. - The biggest concern I have relates to pc-relative addressing. o nmethod constants are currently located next to the instruction section. Putting them into a separately allocated area may break the pc-relative limit. s390x limit: +/- 4GB, no fallback implemented. o relative branches either are + short distance, mostly intra-nmethod + long distance, mostly inter-nmethod + not possible in general, e.g., runtime calls The branch optimization (in shorten_branches) might less often be possible. One example would be if stub code is moved to a separately allocated area. - When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. s390x: never modify data in a cache line where instructions are fetched from. That will kill your performance big time. - I'm not a branch prediction expert. Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. Thanks, Lutz On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" wrote: Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&reserved=0) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&reserved=0), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&reserved=0 https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&reserved=0 > There is JDK-7072317 ?move metadata from CodeCache? (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&reserved=0) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From duke at openjdk.java.net Thu Dec 2 22:31:42 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 2 Dec 2021 22:31:42 GMT Subject: RFR: 8278171: [vectorapi] Mask incorrectly computed for zero extending cast In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 12:03:15 GMT, Mai ??ng Qu?n Anh wrote: > This patch implements vector unsigned upcast intrinsics for x86. I also fixed a bug in the current implementation where the zero extension masks are computed incorrectly and add relevant tests. > > Thank you very much. @PaulSandoz Could you take a look at this PR? Also, could you create an issue for this PR, please. Should this be split into 2, the first one fixes the bug and add tests while the second one implements the intrinsics. Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6634 From duke at openjdk.java.net Thu Dec 2 22:31:42 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 2 Dec 2021 22:31:42 GMT Subject: RFR: 8278171: [vectorapi] Mask incorrectly computed for zero extending cast In-Reply-To: <2MpshpgM8Q5on_qdNcAqweKT6Pe-431p0sKC4fJXRqk=.6129cf7b-79aa-4282-a1a0-2a080ded63b6@github.com> References: <2MpshpgM8Q5on_qdNcAqweKT6Pe-431p0sKC4fJXRqk=.6129cf7b-79aa-4282-a1a0-2a080ded63b6@github.com> Message-ID: On Thu, 2 Dec 2021 19:55:40 GMT, Paul Sandoz wrote: >> This patch implements vector unsigned upcast intrinsics for x86. I also fixed a bug in the current implementation where the zero extension masks are computed incorrectly and add relevant tests. >> >> Thank you very much. > > I am inclined to separated out. Fix the bug, add the tests, and integrate for 18. Then enhance with the intrinsics for 19. > > If you agree to that I will create two bugs. @PaulSandoz Yes, I think that should be the case, thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6634 From psandoz at openjdk.java.net Thu Dec 2 22:31:43 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 2 Dec 2021 22:31:43 GMT Subject: RFR: 8278171: [vectorapi] Mask incorrectly computed for zero extending cast In-Reply-To: References: <2MpshpgM8Q5on_qdNcAqweKT6Pe-431p0sKC4fJXRqk=.6129cf7b-79aa-4282-a1a0-2a080ded63b6@github.com> Message-ID: On Thu, 2 Dec 2021 22:04:36 GMT, Mai ??ng Qu?n Anh wrote: >> I am inclined to separated out. Fix the bug, add the tests, and integrate for 18. Then enhance with the intrinsics for 19. >> >> If you agree to that I will create two bugs. > > @PaulSandoz Yes, I think that should be the case, thank you very much. @merykitty here you go: [vectorapi] Mask incorrectly computed for zero extending cast https://bugs.openjdk.java.net/browse/JDK-8278171 [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts https://bugs.openjdk.java.net/browse/JDK-8278173 ------------- PR: https://git.openjdk.java.net/jdk/pull/6634 From duke at openjdk.java.net Thu Dec 2 22:31:39 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 2 Dec 2021 22:31:39 GMT Subject: RFR: 8278171: [vectorapi] Mask incorrectly computed for zero extending cast Message-ID: This patch implements vector unsigned upcast intrinsics for x86. I also fixed a bug in the current implementation where the zero extension masks are computed incorrectly and add relevant tests. Thank you very much. ------------- Commit messages: - revert intrinsics - Merge branch 'master' into vectorUnsignedCastIntrinsics - retain relevant changes Changes: https://git.openjdk.java.net/jdk/pull/6634/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6634&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278171 Stats: 322 lines in 2 files changed: 321 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6634/head:pull/6634 PR: https://git.openjdk.java.net/jdk/pull/6634 From psandoz at openjdk.java.net Thu Dec 2 22:31:41 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 2 Dec 2021 22:31:41 GMT Subject: RFR: 8278171: [vectorapi] Mask incorrectly computed for zero extending cast In-Reply-To: References: Message-ID: <2MpshpgM8Q5on_qdNcAqweKT6Pe-431p0sKC4fJXRqk=.6129cf7b-79aa-4282-a1a0-2a080ded63b6@github.com> On Wed, 1 Dec 2021 12:03:15 GMT, Mai ??ng Qu?n Anh wrote: > This patch implements vector unsigned upcast intrinsics for x86. I also fixed a bug in the current implementation where the zero extension masks are computed incorrectly and add relevant tests. > > Thank you very much. I am inclined to separated out. Fix the bug, add the tests, and integrate for 18. Then enhance with the intrinsics for 19. If you agree to that I will create two bugs. src/hotspot/cpu/x86/x86.ad line 1819: > 1817: return false; > 1818: } > 1819: break; Collapse cases, since each has the code code? ------------- PR: https://git.openjdk.java.net/jdk/pull/6634 From kvn at openjdk.java.net Thu Dec 2 22:57:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Dec 2021 22:57:14 GMT Subject: RFR: 8277893: Arraycopy stress tests [v2] In-Reply-To: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> References: <3UVCSS5k6cAvWIzxF_1egpBrr69f5Bu8AlhWCMmmINw=.9dd5b6d5-375b-481d-a4e9-cb2ef7a1629f@github.com> Message-ID: On Wed, 1 Dec 2021 09:13:36 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 6m37.855s >> user 56m23.004s >> sys 0m20.148s >> >> # x86_32 (TR 3970X) >> real 11m22.877s >> user 168m8.137s >> sys 5m7.037s >> >> # x86_64 (i5-11500) >> real 15m55.424s >> user 118m0.969s >> sys 0m12.039s >> >> # AArch64 (ThunderX2) >> real 4m5.177s >> user 32m7.295s >> sys 0m19.689s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice Just let you know that testing (tier1,2,3) on linux-aarch64 passed clean. All testing now finished. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From dholmes at openjdk.java.net Fri Dec 3 03:41:19 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 3 Dec 2021 03:41:19 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Marked as reviewed by dholmes (Reviewer). Please check the GHA test results before integrating - there is a failing compiler test. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From david.holmes at oracle.com Fri Dec 3 03:54:57 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 3 Dec 2021 13:54:57 +1000 Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: On 3/12/2021 4:23 am, Zhengyu Gu wrote: > On Thu, 2 Dec 2021 16:23:32 GMT, Thomas Stuefe wrote: > >> Nice cleanup. >> >> You missed a few: >> >> * test/lib/jdk/test/whitebox/WhiteBox.java, `NMTChangeTrackingLevel` >> * test/lib/sun/hotspot/WhiteBox.java, same > > Out of curious, why have two versions? Historical. WB is primarily a hotspot test aid and resided in the hotspot testlibrary in the hotspot repo. Overtime we've been moving to a common testlibrary for everyone now we are all in one repo. No idea where we are with that consolidation process though. I think the person driving that may no longer be around, not sure. >> Will the removal of WB functions cause incompatibilities with older versions of jtreg (can't imagine it does, just wondering)? If you mean in terms of @require checking that involves WB I believe that support class is part of the repo not jtreg. Cheers, David ----- >> > No idea. > >> Cheers, Thomas > >> src/hotspot/share/services/nmtCommon.hpp line 67: >> >>> 65: NMT_off, >>> 66: NMT_summary, >>> 67: NMT_detail >> >> I'm fine with removing the explicit numbers here, but could you please add >> >> STATIC_ASSERT(off > unknown) >> STATIC_ASSERT(summary > off) >> STATIC_ASSERT(detail > summary) >> >> somewhere? (Maybe in the cpp file, its not such an eyesore there). > > Good idea, fixed. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6640 > From dholmes at openjdk.java.net Fri Dec 3 04:50:22 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 3 Dec 2021 04:50:22 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 12:49:45 GMT, Aleksey Shipilev wrote: > SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. > > Additional testing: > - [x] Linux x86_64 fastdebug build > - [ ] GHA Good catch! :) It has been unused since it was introduced by JDK-8186209 argc is still mentioned in the comment on line 1183. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6671 From shade at openjdk.java.net Fri Dec 3 08:23:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Dec 2021 08:23:41 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl [v2] In-Reply-To: References: Message-ID: > SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. > > Additional testing: > - [x] Linux x86_64 fastdebug build > - [ ] GHA Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Tidy up the comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6671/files - new: https://git.openjdk.java.net/jdk/pull/6671/files/18fba172..a2718459 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6671&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6671&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6671.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6671/head:pull/6671 PR: https://git.openjdk.java.net/jdk/pull/6671 From shade at openjdk.java.net Fri Dec 3 08:23:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Dec 2021 08:23:43 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl [v2] In-Reply-To: References: Message-ID: On Fri, 3 Dec 2021 04:47:39 GMT, David Holmes wrote: > argc is still mentioned in the comment on line 1183. Right! Fixed in new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/6671 From nils.eliasson at oracle.com Fri Dec 3 10:36:55 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 3 Dec 2021 11:36:55 +0100 Subject: RFC: improving NMethod code locality in CodeCache In-Reply-To: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> Message-ID: <30fac845-5e82-347d-daac-873750860969@oracle.com> *Hi Evgeny, In the context of this it might be a good time to also reconsider the code cache structure and evaluate other designs that might be a better fit with separate metadata. The current code cache has three separate heaps of fixed sized for three different kinds of code blobs - profiled, nonprofiled and adapters. There are some flexibility that allows the heaps to overflow into the other heaps in low memory situations. What is desirable for the futures is to keep the separation of different kinds of data, but have more granularity, and perhaps different granularity on different platforms. There must be flexibility in the separation so that different parts can grow on demand. It would also be nice to be able to support uncommit so that the code cache can shrink. One such design could be to have a code cache that consists of blocks. One block is a continuous part of memory that holds a specific type of contents, like c2 nmethods. Blocks can be allocated on demand, and blocks can be adapted to fit nicely with TLB page sizes. An empty block could be deallocated and uncommitted if desired. On x86 code and stubs could be kept together, and metadata kept in a separate block. On Aarch stubs might have its own block type. Blocks could also be of different sizes. Adapter blocks might be of a small size so that less memory is wasted, while blocks for profiled code initially get a big block. This scheme would also hopefully avoid some of the sizing problems that the current code cache has. We wouldn't need to guess the need of a specific type - it would be allocated on demand. It would be easy to experiment with different kinds of divisions of data. Just add a new block type. This solution would also facilitate more granular locking of the code cache where allocation or traversal of different blocks can be done independently. What do you think? Best regards, Nils * On 2021-11-23 18:34, Astigeevich, Evgeny wrote: > Hello, > > We?d like to discuss a proposal for improving NMethod code locality in CodeCache. > > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. > > The current NMethod layout is continuous and consists of the following sections: > * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ?sizeof(NMethod)?. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes. > * Relocation > * Constant pool > * Instructions (main code) > * Stub code > * Oops > * Metadata: Class related metadata > * Scopes data: Debugging information > * Scopes pcs: Debugging information > * Dependencies > * Handler table: Exception handler table > * Nul chk table: Implicit Null Pointer exception table > * Speculations > * JVMCI data > > We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ?XX:+LogCompilation?. > Summary of results for jdk17 with tiered compilation: > * DaCapo: > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 152 | 5215 | 916 | > | Total size - bytes | 271,576 | 38,367,872 | 4,072,616 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.7% | 19.3% | 8.0% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 39.7% | 49.7% | 44.5% | > | stub code | 8.9% | 11.3% | 10.1% | > | oops | 0.2% | 0.4% | 0.3% | > | metadata | 2.0% | 3.0% | 2.3% | > | scopes data | 12.2% | 18.6% | 15.9% | > | scopes pcs | 7.8% | 9.0% | 8.4% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.3% | 3.3% | 2.1% | > | nul_chk table | 1.0% | 1.6% | 1.6% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 5135 | 889 | > | Total size - bytes | 264,800 | 35,026,312 | 3,985,744 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.2% | 20.6% | 8.3% | > | consts | 0.0% | 0.6% | 0.1% | > | instrs | 49.2% | 60.7% | 55.3% | > | stub code | 1.1% | 1.9% | 1.4% | > | oops | 0.1% | 0.3% | 0.2% | > | metadata | 1.6% | 2.9% | 2.0% | > | scopes data | 12.2% | 19.6% | 16.8% | > | scopes pcs | 7.8% | 9.2% | 8.5% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.5% | 3.5% | 2.0% | > | nul_chk table | 0.9% | 1.6% | 1.1% | > +---------------+-------+-------+--------+ > > * Renaissance > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 7447 | 1198 | > | Total size - bytes | 366,248 | 52,840,528 | 4,989,392 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.8% | 14.6% | 8.5% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 35.7% | 45.6% | 42.8% | > | stub code | 8.3% | 12.0% | 10.1% | > | oops | 0.2% | 0.6% | 0.4% | > | metadata | 2.0% | 4.1% | 3.0% | > | scopes data | 12.4% | 20.8% | 16.1% | > | scopes pcs | 7.8% | 8.9% | 8.4% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.2% | 3.9% | 2.4% | > | nul_chk table | 0.9% | 1.3% | 1.1% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv): > > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 158 | 7242 | 938 | > | Total size - bytes | 354,952 | 47,019,560 | 3,791,764 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.4% | 15.7% | 9.7% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 46.1% | 54.4% | 52.7% | > | stub code | 1.3% | 1.9% | 1.4% | > | oops | 0.2% | 0.5% | 0.3% | > | metadata | 1.9% | 3.4% | 2.6% | > | scopes data | 12.7% | 23.6% | 17.4% | > | scopes pcs | 8.0% | 9.4% | 8.6% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.3% | 4.0% | 2.5% | > | nul_chk table | 1.0% | 1.4% | 1.2% | > +---------------+-------+-------+--------+ > > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it:https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. > > There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. > > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) > > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. > > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. > > Comments welcome! > > Thanks, > Evgeny Astigeevich, AWS Corretto Team > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From duke at openjdk.java.net Fri Dec 3 12:03:45 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Fri, 3 Dec 2021 12:03:45 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v6] In-Reply-To: References: Message-ID: > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8277372-refactor - Rename BOTConstants and ObjectStartArray members to _card_* - Rename BOTConstants - Merge branch 'master' into JDK-8277372-refactor - Refactoring in hotspot/cpu dir - Initial patch ------------- Changes: https://git.openjdk.java.net/jdk/pull/6570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=05 Stats: 217 lines in 40 files changed: 41 ins; 12 del; 164 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From ayang at openjdk.java.net Fri Dec 3 12:31:13 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 3 Dec 2021 12:31:13 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v6] In-Reply-To: References: Message-ID: On Fri, 3 Dec 2021 12:03:45 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8277372-refactor > - Rename BOTConstants and ObjectStartArray members to _card_* > - Rename BOTConstants > - Merge branch 'master' into JDK-8277372-refactor > - Refactoring in hotspot/cpu dir > - Initial patch Just a very minor comment. src/hotspot/share/gc/parallel/objectStartArray.cpp line 47: > 45: // We're based on the assumption that we use the same > 46: // size blocks as the card table. > 47: assert((int)_card_size == (int)(CardTable::card_size()), "Sanity"); I think the `int` cast can be dropped; both sides are `uint`. ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6570 From duke at openjdk.java.net Fri Dec 3 12:37:17 2021 From: duke at openjdk.java.net (xpbob) Date: Fri, 3 Dec 2021 12:37:17 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:04:44 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr Thanks I use http://cr.openjdk.java.net/~jvernee/UnsafeTest.java The results were about the same with and without the checking I want to implement the java event and test it again env:linux code without unsafe event --- c.unsafe.UnsafeTest.mallocAndFree 8 avgt 20 66.237 ? 0.672 ns/op c.unsafe.UnsafeTest.mallocAndFree 12 avgt 20 66.596 ? 0.286 ns/op c.unsafe.UnsafeTest.mallocAndFree 16 avgt 20 66.219 ? 0.621 ns/op c.unsafe.UnsafeTest.mallocAndFree 24 avgt 20 66.715 ? 0.329 ns/op c.unsafe.UnsafeTest.mallocAndFree 32 avgt 20 65.959 ? 0.714 ns/op c.unsafe.UnsafeTest.mallocAndFree 4096 avgt 20 71.858 ? 3.306 ns/op --- code with event commit ---- c.unsafe.UnsafeTest.mallocAndFree 8 avgt 20 68.402 ? 0.257 ns/op c.unsafe.UnsafeTest.mallocAndFree 12 avgt 20 68.414 ? 0.276 ns/op c.unsafe.UnsafeTest.mallocAndFree 16 avgt 20 68.405 ? 0.216 ns/op c.unsafe.UnsafeTest.mallocAndFree 24 avgt 20 68.427 ? 0.219 ns/op c.unsafe.UnsafeTest.mallocAndFree 32 avgt 20 68.339 ? 0.207 ns/op c.unsafe.UnsafeTest.mallocAndFree 4096 avgt 20 74.645 ? 3.285 ns/op ---- code wtih should_commit --- c.unsafe.UnsafeTest.mallocAndFree 8 avgt 20 68.342 ? 0.273 ns/op c.unsafe.UnsafeTest.mallocAndFree 12 avgt 20 68.653 ? 0.359 ns/op c.unsafe.UnsafeTest.mallocAndFree 16 avgt 20 68.421 ? 0.237 ns/op c.unsafe.UnsafeTest.mallocAndFree 24 avgt 20 68.496 ? 0.205 ns/op c.unsafe.UnsafeTest.mallocAndFree 32 avgt 20 68.370 ? 0.238 ns/op c.unsafe.UnsafeTest.mallocAndFree 4096 avgt 20 75.110 ? 2.866 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From sjohanss at openjdk.java.net Fri Dec 3 13:30:25 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 3 Dec 2021 13:30:25 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v6] In-Reply-To: References: Message-ID: On Fri, 3 Dec 2021 12:03:45 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8277372-refactor > - Rename BOTConstants and ObjectStartArray members to _card_* > - Rename BOTConstants > - Merge branch 'master' into JDK-8277372-refactor > - Refactoring in hotspot/cpu dir > - Initial patch Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From duke at openjdk.java.net Fri Dec 3 13:52:02 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Fri, 3 Dec 2021 13:52:02 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v7] In-Reply-To: References: Message-ID: > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: Change to address @albertnetymk comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6570/files - new: https://git.openjdk.java.net/jdk/pull/6570/files/0f635162..35112e6b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From tschatzl at openjdk.java.net Fri Dec 3 17:24:17 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 3 Dec 2021 17:24:17 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v7] In-Reply-To: References: Message-ID: On Fri, 3 Dec 2021 13:52:02 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Change to address @albertnetymk comment Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6570 From sviswanathan at openjdk.java.net Fri Dec 3 18:18:19 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 3 Dec 2021 18:18:19 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: <-KrgC60_yf9oimwB2SnNx-f7z_EBTbm4-d2OqLzd-Nc=.f7f14de3-ea27-44f1-9488-34ac2ccf6f78@github.com> On Fri, 3 Dec 2021 03:38:32 GMT, David Holmes wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > Please check the GHA test results before integrating - there is a failing compiler test. @dholmes-ora I looked at the results. The failure in compiler/c2/irTests/TestUnsignedComparison.java for x86_32 is unrelated to this patch ( https://bugs.openjdk.java.net/browse/JDK-8277324) and was fixed recently. A merge with master should fix that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Fri Dec 3 18:24:00 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 3 Dec 2021 18:24:00 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v7] In-Reply-To: References: Message-ID: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into copyclearopt - Fix whitespace - Implement review comments - Override threshold only if flag is default - update comment for avx3_threshold() with more details - restrict to Intel core and add comment - 8277617: Optimize array copy and clear on x86_64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/190f974c..9cbfc374 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=05-06 Stats: 34253 lines in 918 files changed: 20994 ins; 7898 del; 5361 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From hohensee at amazon.com Fri Dec 3 18:35:05 2021 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 3 Dec 2021 18:35:05 +0000 Subject: RFC: improving NMethod code locality in CodeCache Message-ID: Hi, Lutz, The constant pool would be in the same chunk as the executable code. It'd be perhaps more accurate to speak of separating "frequently accessed" and "infrequently accessed" parts of an nmethod. Inlined oops have been around for a long time, but for those implementations where that's a bad idea, I believe there's an existing mechanism to put them in the constant pool instead. As long as the constant pool doesn't share cache lines with the executable code, we should be ok. Thanks, Paul ?-----Original Message----- From: hotspot-dev on behalf of "Schmidt, Lutz" Date: Monday, November 29, 2021 at 4:20 AM To: Tobias Hartmann , "Astigeevich, Evgeny" , "hotspot-dev at openjdk.java.net" Subject: Re: RFC: improving NMethod code locality in CodeCache Hi, a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best. - The biggest concern I have relates to pc-relative addressing. o nmethod constants are currently located next to the instruction section. Putting them into a separately allocated area may break the pc-relative limit. s390x limit: +/- 4GB, no fallback implemented. o relative branches either are + short distance, mostly intra-nmethod + long distance, mostly inter-nmethod + not possible in general, e.g., runtime calls The branch optimization (in shorten_branches) might less often be possible. One example would be if stub code is moved to a separately allocated area. - When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. s390x: never modify data in a cache line where instructions are fetched from. That will kill your performance big time. - I'm not a branch prediction expert. Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. Thanks, Lutz On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" wrote: Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&reserved=0) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&reserved=0), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&reserved=0 https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&reserved=0 > There is JDK-7072317 ?move metadata from CodeCache? (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&reserved=0) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias From hohensee at amazon.com Fri Dec 3 18:44:31 2021 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 3 Dec 2021 18:44:31 +0000 Subject: RFC: improving NMethod code locality in CodeCache Message-ID: Regionalized code cache. Excellent idea. Fwiw, I implemented one as part of a dynamic binary translator (x86 -> sparc) at Sun, see https://old.hotchips.org/wp-content/uploads/hc_archives/hc08/2_Mon/HC8.S2/HC8.2.1.pdf, slide 29. See also my comment on https://bugs.openjdk.java.net/browse/JDK-8015774. Worked well in the binary translator context. It threw out the oldest code when full, even if the old code was hot, under the assumption that it would be quickly recompiled. We probably don't want to do that though. Thanks, Paul ?-----Original Message----- From: hotspot-dev on behalf of Nils Eliasson Date: Friday, December 3, 2021 at 2:39 AM To: "hotspot-dev at openjdk.java.net" Subject: Re: RFC: improving NMethod code locality in CodeCache *Hi Evgeny, In the context of this it might be a good time to also reconsider the code cache structure and evaluate other designs that might be a better fit with separate metadata. The current code cache has three separate heaps of fixed sized for three different kinds of code blobs - profiled, nonprofiled and adapters. There are some flexibility that allows the heaps to overflow into the other heaps in low memory situations. What is desirable for the futures is to keep the separation of different kinds of data, but have more granularity, and perhaps different granularity on different platforms. There must be flexibility in the separation so that different parts can grow on demand. It would also be nice to be able to support uncommit so that the code cache can shrink. One such design could be to have a code cache that consists of blocks. One block is a continuous part of memory that holds a specific type of contents, like c2 nmethods. Blocks can be allocated on demand, and blocks can be adapted to fit nicely with TLB page sizes. An empty block could be deallocated and uncommitted if desired. On x86 code and stubs could be kept together, and metadata kept in a separate block. On Aarch stubs might have its own block type. Blocks could also be of different sizes. Adapter blocks might be of a small size so that less memory is wasted, while blocks for profiled code initially get a big block. This scheme would also hopefully avoid some of the sizing problems that the current code cache has. We wouldn't need to guess the need of a specific type - it would be allocated on demand. It would be easy to experiment with different kinds of divisions of data. Just add a new block type. This solution would also facilitate more granular locking of the code cache where allocation or traversal of different blocks can be done independently. What do you think? Best regards, Nils * On 2021-11-23 18:34, Astigeevich, Evgeny wrote: > Hello, > > We?d like to discuss a proposal for improving NMethod code locality in CodeCache. > > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. > > The current NMethod layout is continuous and consists of the following sections: > * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ?sizeof(NMethod)?. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes. > * Relocation > * Constant pool > * Instructions (main code) > * Stub code > * Oops > * Metadata: Class related metadata > * Scopes data: Debugging information > * Scopes pcs: Debugging information > * Dependencies > * Handler table: Exception handler table > * Nul chk table: Implicit Null Pointer exception table > * Speculations > * JVMCI data > > We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ?XX:+LogCompilation?. > Summary of results for jdk17 with tiered compilation: > * DaCapo: > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 152 | 5215 | 916 | > | Total size - bytes | 271,576 | 38,367,872 | 4,072,616 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.7% | 19.3% | 8.0% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 39.7% | 49.7% | 44.5% | > | stub code | 8.9% | 11.3% | 10.1% | > | oops | 0.2% | 0.4% | 0.3% | > | metadata | 2.0% | 3.0% | 2.3% | > | scopes data | 12.2% | 18.6% | 15.9% | > | scopes pcs | 7.8% | 9.0% | 8.4% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.3% | 3.3% | 2.1% | > | nul_chk table | 1.0% | 1.6% | 1.6% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 5135 | 889 | > | Total size - bytes | 264,800 | 35,026,312 | 3,985,744 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.2% | 20.6% | 8.3% | > | consts | 0.0% | 0.6% | 0.1% | > | instrs | 49.2% | 60.7% | 55.3% | > | stub code | 1.1% | 1.9% | 1.4% | > | oops | 0.1% | 0.3% | 0.2% | > | metadata | 1.6% | 2.9% | 2.0% | > | scopes data | 12.2% | 19.6% | 16.8% | > | scopes pcs | 7.8% | 9.2% | 8.5% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.5% | 3.5% | 2.0% | > | nul_chk table | 0.9% | 1.6% | 1.1% | > +---------------+-------+-------+--------+ > > * Renaissance > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 7447 | 1198 | > | Total size - bytes | 366,248 | 52,840,528 | 4,989,392 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.8% | 14.6% | 8.5% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 35.7% | 45.6% | 42.8% | > | stub code | 8.3% | 12.0% | 10.1% | > | oops | 0.2% | 0.6% | 0.4% | > | metadata | 2.0% | 4.1% | 3.0% | > | scopes data | 12.4% | 20.8% | 16.1% | > | scopes pcs | 7.8% | 8.9% | 8.4% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.2% | 3.9% | 2.4% | > | nul_chk table | 0.9% | 1.3% | 1.1% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv): > > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 158 | 7242 | 938 | > | Total size - bytes | 354,952 | 47,019,560 | 3,791,764 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.4% | 15.7% | 9.7% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 46.1% | 54.4% | 52.7% | > | stub code | 1.3% | 1.9% | 1.4% | > | oops | 0.2% | 0.5% | 0.3% | > | metadata | 1.9% | 3.4% | 2.6% | > | scopes data | 12.7% | 23.6% | 17.4% | > | scopes pcs | 8.0% | 9.4% | 8.6% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.3% | 4.0% | 2.5% | > | nul_chk table | 1.0% | 1.4% | 1.2% | > +---------------+-------+-------+--------+ > > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it:https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. > > There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. > > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) > > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. > > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. > > Comments welcome! > > Thanks, > Evgeny Astigeevich, AWS Corretto Team > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From zgu at openjdk.java.net Fri Dec 3 19:30:12 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 3 Dec 2021 19:30:12 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> Message-ID: On Thu, 2 Dec 2021 18:29:45 GMT, Thomas Stuefe wrote: > Looks good to me. Thank you for doing this! > > ..Thomas Thanks for the review, @tstuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From sviswanathan at openjdk.java.net Fri Dec 3 21:10:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 3 Dec 2021 21:10:17 GMT Subject: Integrated: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 01:23:04 GMT, Sandhya Viswanathan wrote: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 24e16ac6 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/24e16ac637095d7dee1d6fe34f996b68eedfa8bc Stats: 29 lines in 5 files changed: 15 ins; 0 del; 14 mod 8277617: Adjust AVX3Threshold for copy/fill stubs Reviewed-by: jbhateja, dholmes, neliasso, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From coleenp at openjdk.java.net Sat Dec 4 13:08:18 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 4 Dec 2021 13:08:18 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 07:18:22 GMT, Thomas Stuefe wrote: > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :( Definitely. I think having some assert code that verifies that we don't do anything "unsafe" while in AsyncGetCallTrace might be a good enhancement, but the definition of "unsafe" in this case might be almost anything we do. This change chops off a piece of the top of the iceberg as observed. Thanks for all the code reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/6606 From coleenp at openjdk.java.net Sat Dec 4 13:08:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 4 Dec 2021 13:08:19 GMT Subject: Integrated: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. This pull request has now been integrated. Changeset: 267c024e Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/267c024eb52acd1611188dd5b1417b877ff3eafd Stats: 9 lines in 2 files changed: 0 ins; 3 del; 6 mod 8265150: AsyncGetCallTrace crashes on ResourceMark Reviewed-by: dholmes, stuefe, eosterlund, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/6606 From dholmes at openjdk.java.net Sun Dec 5 12:04:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 5 Dec 2021 12:04:12 GMT Subject: RFR: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl [v2] In-Reply-To: References: Message-ID: On Fri, 3 Dec 2021 08:23:41 GMT, Aleksey Shipilev wrote: >> SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug build >> - [ ] GHA > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Tidy up the comment Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6671 From shade at openjdk.java.net Sun Dec 5 21:40:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 5 Dec 2021 21:40:10 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: <_tpNEJ7RnKYConrel3BdMSsn7DRHOZX1i1lqX3yVZD8=.82735e32-7713-4d6f-8142-9bba01cbee48@github.com> References: <_tpNEJ7RnKYConrel3BdMSsn7DRHOZX1i1lqX3yVZD8=.82735e32-7713-4d6f-8142-9bba01cbee48@github.com> Message-ID: On Thu, 2 Dec 2021 07:29:31 GMT, Vladimir Kozlov wrote: > I don't see issues with these changes in my testing. I submitted our tier1,2,3 testing in internal infra. Vladimir, are internal infra results good? ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From shade at openjdk.java.net Sun Dec 5 21:42:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 5 Dec 2021 21:42:16 GMT Subject: Integrated: 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 12:49:45 GMT, Aleksey Shipilev wrote: > SonarCloud complains about variable initialization in large conditional statements. In that tower of if-expressions, it is not even clear to me that `argc` would be properly initialized. Seems better to clean that up. > > Additional testing: > - [x] Linux x86_64 fastdebug build > - [x] GHA This pull request has now been integrated. Changeset: 839b6067 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/839b6067c85cfc260803af9b01dd1e7e7f8388db Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8278143: Remove unused "argc" from ConstantPool::copy_bootstrap_arguments_at_impl Reviewed-by: lfoltan, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6671 From kvn at openjdk.java.net Sun Dec 5 22:23:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 5 Dec 2021 22:23:11 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: <58oAR2iULxhHhO3as1oPSIHAQKL3an6BOgafpjbvOPQ=.25df644d-a7c1-43ea-a18c-e838cb245171@github.com> On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too Yes, all passed. You can integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From duke at openjdk.java.net Mon Dec 6 05:50:18 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Mon, 6 Dec 2021 05:50:18 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 12:10:17 GMT, Thomas Schatzl wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactoring in hotspot/cpu dir > > @tstuefe : can you check whether the s390 and ppc changes still compile? The changes look straightforward enough, but... > > Thanks, > Thomas @tschatzl is there anything else required from my end? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From shade at openjdk.java.net Mon Dec 6 06:30:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 06:30:18 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From shade at openjdk.java.net Mon Dec 6 06:30:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 06:30:19 GMT Subject: Integrated: 8278016: Add compiler tests to tier{2,3} In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 19:29:36 GMT, Aleksey Shipilev wrote: > I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. > > Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. > > We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. > > Sample times for new subgroups (think about this as "How much time they add to existing tiers"): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 > ============================== > > real 2m16.518s > user 35m40.839s > sys 1m35.334s > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 > ============================== > > real 4m31.935s > user 71m54.617s > sys 2m13.073s This pull request has now been integrated. Changeset: f180a459 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/f180a4591f52d0af0c030aa85be33c51b06c90ee Stats: 46 lines in 1 file changed: 46 ins; 0 del; 0 mod 8278016: Add compiler tests to tier{2,3} Reviewed-by: kvn, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6622 From pli at openjdk.java.net Mon Dec 6 09:09:16 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Mon, 6 Dec 2021 09:09:16 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 17:24:18 GMT, Andrew Haley wrote: >> Arraycopy partial inlining is a C2 compiler technique that avoids stub >> call overhead in small-sized arraycopy operations by generating masked >> vector instructions. So far it works on x86 AVX512 only and this patch >> enables it on AArch64 with SVE. >> >> We add AArch64 matching rule for VectorMaskGenNode and refactor that >> node a little bit. The major change is moving the element type field >> into its TypeVectMask bottom type. The reason is that AArch64 vector >> masks are different for different vector element types. >> >> E.g., an x86 AVX512 vector mask value masking 3 least significant vector >> lanes (of any type) is like >> >> `0000 0000 ... 0000 0000 0000 0000 0111` >> >> On AArch64 SVE, this mask value can only be used for masking the 3 least >> significant lanes of bytes. But for 3 lanes of ints, the value should be >> >> `0000 0000 ... 0000 0000 0001 0001 0001` >> >> where the least significant bit of each lane matters. So AArch64 matcher >> needs to know the vector element type to generate right masks. >> >> After this patch, the C2 generated code for copying a 50-byte array on >> AArch64 SVE looks like >> >> mov x12, #0x32 >> whilelo p0.b, xzr, x12 >> add x11, x11, #0x10 >> ld1b {z16.b}, p0/z, [x11] >> add x10, x10, #0x10 >> st1b {z16.b}, p0, [x10] >> >> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on >> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested >> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array >> size arguments on a 512-bit SVE-featured CPU. We got below performance >> data changes. >> >> Benchmark (length) (Performance) >> ArrayCopyAligned.testByte 10 -2.6% >> ArrayCopyAligned.testByte 20 +4.7% >> ArrayCopyAligned.testByte 30 +4.8% >> ArrayCopyAligned.testByte 40 +21.7% >> ArrayCopyAligned.testByte 50 +22.5% >> ArrayCopyAligned.testByte 60 +28.4% >> >> The test machine has SVE vector size of 512 bits, so we see performance >> gain for most array sizes less than 64 bytes. For very small arrays we >> see a bit regression because a vector load/store may be a bit slower >> than 1 or 2 scalar loads/stores. > > Hurrah! I have managed to duplicate your results. > > Old: > > Benchmark (length) Mode Cnt Score Error Units > ArrayCopyAligned.testByte 40 avgt 5 23.332 ? 0.016 ns/op > > > New: > > ArrayCopyAligned.testByte 40 avgt 5 18.092 ? 0.093 ns/op > > > ... and in fact your result is much better than this suggests, because the bulk of the test is fetching all of the arguments to arraycopy, not actually copying the bytes. I get it now. Hi @theRealAph , are you still looking at this? I have another big fix which depends on the vector mask change inside this patch. So I hope this can be integrated soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From mdoerr at openjdk.java.net Mon Dec 6 09:36:24 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 6 Dec 2021 09:36:24 GMT Subject: RFR: 8253860: PPC: Relocation::pd_set_data_value conflates compressed oops and klasses Message-ID: Casting narrow Klass pointer to a narrow Oop is problematic. Note that `NativeMovConstReg::set_narrow_oop` only supports narrow Oops. It turns out, that the problematic code is unused. We never patch narrow Klass pointers in the instruction stream on PPC64 (metadata_Relocation::pd_fix_value has an empty implementation). In contrast to that, narrow Oops in the instructions stream always get patched when the nmethod gets installed (by fix_oop_relocations). This makes sense as Metadata doesn't get relocated, but Oops may be moved by GC and the instructions need to get the current value during nmethod installation. Note that the initial constants we were using for narrow Oops in the instruction stream were not correct (Oop compression missing, not updated by GC). So, I think it's better to use 0 to avoid confusion. ------------- Commit messages: - 8253860: PPC: Relocation::pd_set_data_value conflates compressed oops and klasses Changes: https://git.openjdk.java.net/jdk/pull/6716/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6716&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253860 Stats: 39 lines in 5 files changed: 1 ins; 24 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/6716.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6716/head:pull/6716 PR: https://git.openjdk.java.net/jdk/pull/6716 From shade at openjdk.java.net Mon Dec 6 10:17:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 10:17:52 GMT Subject: RFR: 8277893: Arraycopy stress tests [v3] In-Reply-To: References: Message-ID: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 6m37.855s > user 56m23.004s > sys 0m20.148s > > # x86_32 (TR 3970X) > real 11m22.877s > user 168m8.137s > sys 5m7.037s > > # x86_64 (i5-11500) > real 15m55.424s > user 118m0.969s > sys 0m12.039s > > # AArch64 (ThunderX2) > real 4m5.177s > user 32m7.295s > sys 0m19.689s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Package declarations - Add safety check for small systems - Renames - Single driver for all the tests - Safer timeout settings - Post-merge TEST.groups cleanup - Merge branch 'master' into JDK-8277893-arraycopy-tests - Merge branch 'master' into JDK-8277893-arraycopy-tests - Separate test group and hooks into hotspot_slow_compiler - Trim down MAX_SIZE and explain the choice - ... and 1 more: https://git.openjdk.java.net/jdk/compare/228a50d8...118a3eb2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6594/files - new: https://git.openjdk.java.net/jdk/pull/6594/files/da7ed51e..118a3eb2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=01-02 Stats: 10957 lines in 493 files changed: 6701 ins; 2473 del; 1783 mod Patch: https://git.openjdk.java.net/jdk/pull/6594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594 PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Mon Dec 6 10:56:21 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 10:56:21 GMT Subject: RFR: 8277893: Arraycopy stress tests [v3] In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 10:17:52 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Package declarations > - Add safety check for small systems > - Renames > - Single driver for all the tests > - Safer timeout settings > - Post-merge TEST.groups cleanup > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/3fffcc8b...118a3eb2 Yes, definitely GC arraycopy barriers in object array copy cases. Shenandoah cuts the overhead in half by emitting the runtime check on GC state, so it can skip calling to runtime in most cases. ZGC calls to runtime for all object copies. This adds up quite a bit for small object arrays. I looked whether to trim down the array sizes we feed into these tests, but that does not look compelling to me, as these tests were useful with current settings in arraycopy improvements work. The GC specific overheads also make object array tests take a disproportionate amount of time, wrecking parallelism that might have helped to subsume these penalties. So, new work: a) Sets up larger timeouts to cater for slow machines; b) Rewrites the per-type tests to use a single driver, so it can balance over single-type jobs; c) Defaults the stress parallelism to `N_CPU / 4` New test times are updated in PR body. @vnkozlov, would you like to try this in Oracle infra again? ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From aph at openjdk.java.net Mon Dec 6 11:39:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 6 Dec 2021 11:39:11 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: <3FSZTQE51oLz9b3VnL8ydkDT8FgT7VcIQhFM6-BVUTQ=.5671f668-df59-48ed-9cef-47de3a5b3c20@github.com> On Thu, 18 Nov 2021 17:24:18 GMT, Andrew Haley wrote: >> Arraycopy partial inlining is a C2 compiler technique that avoids stub >> call overhead in small-sized arraycopy operations by generating masked >> vector instructions. So far it works on x86 AVX512 only and this patch >> enables it on AArch64 with SVE. >> >> We add AArch64 matching rule for VectorMaskGenNode and refactor that >> node a little bit. The major change is moving the element type field >> into its TypeVectMask bottom type. The reason is that AArch64 vector >> masks are different for different vector element types. >> >> E.g., an x86 AVX512 vector mask value masking 3 least significant vector >> lanes (of any type) is like >> >> `0000 0000 ... 0000 0000 0000 0000 0111` >> >> On AArch64 SVE, this mask value can only be used for masking the 3 least >> significant lanes of bytes. But for 3 lanes of ints, the value should be >> >> `0000 0000 ... 0000 0000 0001 0001 0001` >> >> where the least significant bit of each lane matters. So AArch64 matcher >> needs to know the vector element type to generate right masks. >> >> After this patch, the C2 generated code for copying a 50-byte array on >> AArch64 SVE looks like >> >> mov x12, #0x32 >> whilelo p0.b, xzr, x12 >> add x11, x11, #0x10 >> ld1b {z16.b}, p0/z, [x11] >> add x10, x10, #0x10 >> st1b {z16.b}, p0, [x10] >> >> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on >> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested >> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array >> size arguments on a 512-bit SVE-featured CPU. We got below performance >> data changes. >> >> Benchmark (length) (Performance) >> ArrayCopyAligned.testByte 10 -2.6% >> ArrayCopyAligned.testByte 20 +4.7% >> ArrayCopyAligned.testByte 30 +4.8% >> ArrayCopyAligned.testByte 40 +21.7% >> ArrayCopyAligned.testByte 50 +22.5% >> ArrayCopyAligned.testByte 60 +28.4% >> >> The test machine has SVE vector size of 512 bits, so we see performance >> gain for most array sizes less than 64 bytes. For very small arrays we >> see a bit regression because a vector load/store may be a bit slower >> than 1 or 2 scalar loads/stores. > > Hurrah! I have managed to duplicate your results. > > Old: > > Benchmark (length) Mode Cnt Score Error Units > ArrayCopyAligned.testByte 40 avgt 5 23.332 ? 0.016 ns/op > > > New: > > ArrayCopyAligned.testByte 40 avgt 5 18.092 ? 0.093 ns/op > > > ... and in fact your result is much better than this suggests, because the bulk of the test is fetching all of the arguments to arraycopy, not actually copying the bytes. I get it now. > Hi @theRealAph , are you still looking at this? I have another big fix which depends on the vector mask change inside this patch. So I hope this can be integrated soon. I'm quite happy with the AArch64 parts, but I'm not familiar with that part of the C2 compiler. I think you need an additional reviewer, perhaps @rwestrel . ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From pli at openjdk.java.net Mon Dec 6 12:14:08 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Mon, 6 Dec 2021 12:14:08 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Thanks Andrew! Can any reviewer help look at the C2 mid-end part? ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From ddong at openjdk.java.net Mon Dec 6 12:15:15 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 6 Dec 2021 12:15:15 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 10:21:05 GMT, Andrew Haley wrote: > Thank you for this. I'll have a look. > > Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it > > It would be nice to get -XX:+PreserveFramePointer working correctly. Thanks for the response. I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: aarch64.ad aarch64_enc_java_to_runtime Label retaddr; __ adr(rscratch2, retaddr); __ lea(rscratch1, RuntimeAddress(entry)); // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); __ blr(rscratch1); __ bind(retaddr); __ add(sp, sp, 2 * wordSize); MacroAssembler::call_VM_leaf_base stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); mov(rscratch1, entry_point); blr(rscratch1); if (retaddr) bind(*retaddr); ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From aph at openjdk.java.net Mon Dec 6 13:49:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 6 Dec 2021 13:49:11 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 12:11:42 GMT, Denghui Dong wrote: > > Thank you for this. I'll have a look. > > Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it > > It would be nice to get -XX:+PreserveFramePointer working correctly. > > Thanks for the response. > > I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: It's not reserving anything, it's saving the PC for the stack unwinder. > ``` > aarch64.ad > > aarch64_enc_java_to_runtime > > Label retaddr; > __ adr(rscratch2, retaddr); > __ lea(rscratch1, RuntimeAddress(entry)); > // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() > __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); > __ blr(rscratch1); > __ bind(retaddr); > __ add(sp, sp, 2 * wordSize); > ``` I wrote it. If you look at `JavaFrameAnchor::capture_last_Java_pc()` you'll see it being used. > ``` > MacroAssembler::call_VM_leaf_base > > > stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); > > mov(rscratch1, entry_point); > blr(rscratch1); > if (retaddr) > bind(*retaddr); > > ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); > ``` > > I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. All this is doing is saving `rmethod` (which is in a call-clobbered register) around a VM call. `retaddr` is saved for OOP maps. ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From duke at openjdk.java.net Mon Dec 6 15:32:24 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Mon, 6 Dec 2021 15:32:24 GMT Subject: Integrated: 8277372: Add getters for BOT and card table members In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 07:33:44 GMT, Vishal Chand wrote: > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members This pull request has now been integrated. Changeset: adf39522 Author: Vishal Chand Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/adf39522c178b82dc73e341751b2d9aba984469d Stats: 217 lines in 40 files changed: 41 ins; 12 del; 164 mod 8277372: Add getters for BOT and card table members Reviewed-by: tschatzl, sjohanss, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From tschatzl at openjdk.java.net Mon Dec 6 15:37:16 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 6 Dec 2021 15:37:16 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 18:22:56 GMT, Aleksey Shipilev wrote: > Our support engineers asked this: > >> I see these G1Concurrent safepoints in JDK17: >> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching > safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >> I've always thought that "concurrent" and "safepoint" are basically antonyms. >> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? > > I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: > > > [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms > [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns > [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms > [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns > [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1VMOperations.hpp line 91: > 89: // Concurrent G1 stop-the-world operations such as remark and cleanup. > 90: class VM_G1PauseConcurrent : public VM_Operation { > 91: private: This `private` visibility specifier could be removed. src/hotspot/share/gc/g1/g1VMOperations.hpp line 91: > 89: // Concurrent G1 stop-the-world operations such as remark and cleanup. > 90: class VM_G1PauseConcurrent : public VM_Operation { > 91: private: This `private` visibility specifier could be removed. src/hotspot/share/gc/g1/g1VMOperations.hpp line 103: > 101: virtual void doit_epilogue(); > 102: virtual void doit(); > 103: virtual void work() = 0; Please make `work()` protected - there does not seem to be a need to make it public. ------------- PR: https://git.openjdk.java.net/jdk/pull/6677 From ddong at openjdk.java.net Mon Dec 6 15:50:11 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 6 Dec 2021 15:50:11 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 17:40:43 GMT, Denghui Dong wrote: > Hi, > > I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing. > > The following steps can quick reproduce the problem: > > 1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc) > > index 39e99bdd5ed..4fc768e94aa 100644 > --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > @@ -3558,6 +3558,7 @@ void TemplateTable::_new() { > __ store_klass_gap(r0, zr); // zero klass gap for compressed oops > __ store_klass(r0, r4); // store klass last > > +/** > { > SkipIfEqual skip(_masm, &DTraceAllocProbes, false); > // Trigger dtrace event for fastpath > @@ -3567,6 +3568,7 @@ void TemplateTable::_new() { > __ pop(atos); // restore the return value > > } > +*/ > __ b(done); > } > > diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp > index 19530b7c57c..15b0509da4c 100644 > --- a/src/hotspot/cpu/x86/templateTable_x86.cpp > +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp > @@ -4033,6 +4033,7 @@ void TemplateTable::_new() { > Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg); > __ store_klass(rax, rcx, tmp_store_klass); // klass > > +/** > { > SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0); > // Trigger dtrace event for fastpath > @@ -4041,6 +4042,7 @@ void TemplateTable::_new() { > CAST_FROM_FN_PTR(address, static_cast(SharedRuntime::dtrace_object_alloc)), rax); > __ pop(atos); > } > +*/ > > __ jmp(done); > } > diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp > index a5de65ea5ab..60b4bd3bcc8 100644 > --- a/src/hotspot/share/runtime/sharedRuntime.cpp > +++ b/src/hotspot/share/runtime/sharedRuntime.cpp > @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) { > * 6254741. Once that is fixed we can remove the dummy return value. > */ > int SharedRuntime::dtrace_object_alloc(oopDesc* o) { > + *(int*)0 = 1; > return dtrace_object_alloc(Thread::current(), o, o->size()); > } > > > 2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version` > > On x86_64, the native stack in hs log is complete, but on AArch64, the native stack is incorrect. > > In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong. > > After some investigation, I found that this problem is related to the layout of the stack. > > On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack. > Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame). > > > push %rbp > mov %rsp,%rbp > > _ _ _ _ _ _ > | | > | | | > |_ _ _ _ _ _| | > | | | > caller | | <- caller sp | > _ _ _ |_ _ _ _ _ _| | expand > | | | > | ret addr | | direction > callee |_ _ _ _ _ _| | > | | V > | caller fp | <- fp > |_ _ _ _ _ _| > > > > But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack. > Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way). > > When `caller` is a C1/C2 method A, and `callee` is a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it. > > > stp x29, x30, [sp, #-N]! > mov x29, sp > > _ _ _ _ _ _ > | | > | | | > |_ _ _ _ _ _| | > | | | > caller | | <- caller sp | > _ _ _ |_ _ _ _ _ _| - | expand > | | > . . . . . | | direction > _ _ _ _ _ _ | | > | | | N | > | ret addr | | | > callee |_ _ _ _ _ _| | | > | | - V > | caller fp | <- fp > |_ _ _ _ _ _| > > > > I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current. > > Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled. > > Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment. > Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform. > > This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled. > > Any input is appreciated. > > Thanks, > Denghui Thank you for the explanation:-) ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From phedlin at openjdk.java.net Mon Dec 6 16:16:32 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 6 Dec 2021 16:16:32 GMT Subject: RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 Message-ID: Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. Implementation focusing on balance between small footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check. Testing: tier1-6 Benchmarks (ran on Aurora/Ampere Altra): openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII..........72.23% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5...........70.38% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15....67.81% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16......... 3.72% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8..........68.50% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ASCII...........65.59% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:BIG5............60.59% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ISO_8859_15.....63.79% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_16.......... 1.04% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_8...........63.33% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ASCII............57.25% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:BIG5.............49.33% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ISO_8859_15......61.37% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_16........... 0.02% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_8............54.75% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII............54.52% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5.............40.41% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15......58.46% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16...........-0.55% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8............55.98% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII............47.37% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5.............36.41% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15......50.83% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16........... 8.63% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8............48.95% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII.............17.55% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5..............18.58% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15.......20.82% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16............ 4.16% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8.............18.44% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII.............21.96% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5..............22.42% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15.......30.27% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16............-1.17% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8.............35.99% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII............. 6.19% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5.............. 7.34% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15....... 8.34% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16............-0.46% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8............. 6.80% ------------- Commit messages: - Removing old implementation of encode_iso_array(). - Interleaved ISO and ASCII check code. Using post inc in main loop. - 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 Changes: https://git.openjdk.java.net/jdk/pull/6723/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6723&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274243 Stats: 218 lines in 5 files changed: 89 ins; 92 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/6723.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6723/head:pull/6723 PR: https://git.openjdk.java.net/jdk/pull/6723 From phedlin at openjdk.java.net Mon Dec 6 16:16:33 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 6 Dec 2021 16:16:33 GMT Subject: RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 14:09:07 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation focusing on balance between small footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check. > > Testing: tier1-6 > > Benchmarks (ran on Aurora/Ampere Altra): > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII..........72.23% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5...........70.38% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15....67.81% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16......... 3.72% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8..........68.50% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ASCII...........65.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:BIG5............60.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ISO_8859_15.....63.79% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_16.......... 1.04% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_8...........63.33% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ASCII............57.25% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:BIG5.............49.33% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ISO_8859_15......61.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_16........... 0.02% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_8............54.75% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII............54.52% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5.............40.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15......58.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16...........-0.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8............55.98% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII............47.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5.............36.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15......50.83% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16........... 8.63% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8............48.95% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII.............17.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5..............18.58% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15.......20.82% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16............ 4.16% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8.............18.44% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII.............21.96% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5..............22.42% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15.......30.27% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16............-1.17% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8.............35.99% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII............. 6.19% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5.............. 7.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15....... 8.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16............-0.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8............. 6.80% Current implementation (including prefetch hint). Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 144.807 ? 9.557 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 378.458 ? 206.193 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 200.844 ? 14.998 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 356.589 ? 8.588 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 698.518 ? 17.269 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 862.678 ? 30.872 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 109.413 ? 2.780 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 522.050 ? 34.763 ns/op **-XX:SoftwarePrefetchHintDistance=128** Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 144.519 ? 12.731 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 302.409 ? 51.020 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 201.144 ? 14.624 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 469.724 ? 4.871 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 695.666 ? 22.061 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 858.812 ? 22.913 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 109.598 ? 1.921 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 511.589 ? 34.407 ns/op New implementation (disregards prefetch hint). Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 116.916 ? 14.340 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 292.334 ? 10.038 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 178.490 ? 11.258 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 363.741 ? 13.080 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 695.520 ? 20.217 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 862.785 ? 13.673 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 108.263 ? 6.123 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 516.571 ? 22.347 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6723 From redestad at openjdk.java.net Mon Dec 6 17:15:20 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 6 Dec 2021 17:15:20 GMT Subject: RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 14:09:07 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation focusing on balance between small footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check. > > Testing: tier1-6 > > Benchmarks (ran on Aurora/Ampere Altra): > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII..........72.23% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5...........70.38% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15....67.81% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16......... 3.72% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8..........68.50% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ASCII...........65.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:BIG5............60.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ISO_8859_15.....63.79% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_16.......... 1.04% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_8...........63.33% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ASCII............57.25% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:BIG5.............49.33% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ISO_8859_15......61.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_16........... 0.02% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_8............54.75% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII............54.52% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5.............40.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15......58.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16...........-0.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8............55.98% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII............47.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5.............36.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15......50.83% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16........... 8.63% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8............48.95% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII.............17.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5..............18.58% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15.......20.82% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16............ 4.16% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8.............18.44% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII.............21.96% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5..............22.42% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15.......30.27% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16............-1.17% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8.............35.99% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII............. 6.19% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5.............. 7.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15....... 8.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16............-0.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8............. 6.80% Great to see this come along, thanks! I can't review the code, but I think it'd be good to collect benchmark scores for `ISO-8859-1` before and after, along with absolute numbers so we see how things stack up against the existing ISO-8859-1-only intrinsic. ------------- PR: https://git.openjdk.java.net/jdk/pull/6723 From shade at openjdk.java.net Mon Dec 6 17:31:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 17:31:39 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v2] In-Reply-To: References: Message-ID: > Our support engineers asked this: > >> I see these G1Concurrent safepoints in JDK17: >> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching > safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >> I've always thought that "concurrent" and "safepoint" are basically antonyms. >> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? > > I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: > > > [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms > [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns > [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms > [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns > [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Review Thomas - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop - Whitespace and touchups - Basic implementation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6677/files - new: https://git.openjdk.java.net/jdk/pull/6677/files/e6649454..06479f45 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6677&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6677&range=00-01 Stats: 6871 lines in 381 files changed: 4591 ins; 890 del; 1390 mod Patch: https://git.openjdk.java.net/jdk/pull/6677.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6677/head:pull/6677 PR: https://git.openjdk.java.net/jdk/pull/6677 From shade at openjdk.java.net Mon Dec 6 17:31:48 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Dec 2021 17:31:48 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v2] In-Reply-To: References: Message-ID: <80kI6qpwSR5BRzGJePSU5nLP6CoXbB7mCjhDlXYopAE=.ed884923-2ba1-458a-a0c4-e1402b3ddcf8@github.com> On Mon, 6 Dec 2021 15:33:13 GMT, Thomas Schatzl wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Review Thomas >> - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop >> - Whitespace and touchups >> - Basic implementation > > src/hotspot/share/gc/g1/g1VMOperations.hpp line 91: > >> 89: // Concurrent G1 stop-the-world operations such as remark and cleanup. >> 90: class VM_G1PauseConcurrent : public VM_Operation { >> 91: private: > > This `private` visibility specifier could be removed. Done. > src/hotspot/share/gc/g1/g1VMOperations.hpp line 91: > >> 89: // Concurrent G1 stop-the-world operations such as remark and cleanup. >> 90: class VM_G1PauseConcurrent : public VM_Operation { >> 91: private: > > This `private` visibility specifier could be removed. Done. > src/hotspot/share/gc/g1/g1VMOperations.hpp line 103: > >> 101: virtual void doit_epilogue(); >> 102: virtual void doit(); >> 103: virtual void work() = 0; > > Please make `work()` protected - there does not seem to be a need to make it public. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6677 From kvn at openjdk.java.net Mon Dec 6 22:01:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 6 Dec 2021 22:01:15 GMT Subject: RFR: 8277893: Arraycopy stress tests [v3] In-Reply-To: References: Message-ID: <8gzkgbvL-ist_ZhNekJ5V7MX5hqjrjtfgKdF84LZT5E=.9f743cfa-8f2b-4497-9f58-c9cf5360b1bc@github.com> On Mon, 6 Dec 2021 10:17:52 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Package declarations > - Add safety check for small systems > - Renames > - Single driver for all the tests > - Safer timeout settings > - Post-merge TEST.groups cleanup > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Separate test group and hooks into hotspot_slow_compiler > - Trim down MAX_SIZE and explain the choice > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/b83c9348...118a3eb2 I started testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From sspitsyn at openjdk.java.net Tue Dec 7 00:35:27 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 7 Dec 2021 00:35:27 GMT Subject: RFR: 8272395: Bad HTML in JVMTI man page Message-ID: This fix adds escaping of invalid characters '[]' for 3 URLs defined in the jvmti.xml. This file is a base to generate the jvmti.html. ------------- Commit messages: - fix: 8272395: Bad HTML in JVMTI man page Changes: https://git.openjdk.java.net/jdk/pull/6730/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6730&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272395 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6730.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6730/head:pull/6730 PR: https://git.openjdk.java.net/jdk/pull/6730 From dholmes at openjdk.java.net Tue Dec 7 01:57:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 7 Dec 2021 01:57:12 GMT Subject: RFR: 8272395: Bad HTML in JVMTI man page In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 00:27:45 GMT, Serguei Spitsyn wrote: > This fix adds escaping of invalid characters '[]' for 3 URLs defined in the jvmti.xml. > This file is a base to generate the jvmti.html. Looks good and trivial. I never realized the browser was presenting a user-friendly version of the URL, showing the [], but if you copy and past the URL they are converted to %5B%5D. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6730 From iris at openjdk.java.net Tue Dec 7 03:41:17 2021 From: iris at openjdk.java.net (Iris Clark) Date: Tue, 7 Dec 2021 03:41:17 GMT Subject: RFR: 8272395: Bad HTML in JVMTI man page In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 00:27:45 GMT, Serguei Spitsyn wrote: > This fix adds escaping of invalid characters '[]' for 3 URLs defined in the jvmti.xml. > This file is a base to generate the jvmti.html. Marked as reviewed by iris (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6730 From duke at openjdk.java.net Tue Dec 7 05:48:54 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 7 Dec 2021 05:48:54 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v6] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request incrementally with two additional commits since the last revision: - remove event in metadata - add java event ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/b09c744d..3f1e41cb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=04-05 Stats: 315 lines in 11 files changed: 270 ins; 34 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From duke at openjdk.java.net Tue Dec 7 05:55:43 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 7 Dec 2021 05:55:43 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v7] In-Reply-To: References: Message-ID: <1aHjR7kkqm_55EndM2fD7SaBbbCzQXkCNqbNToMWW3U=.dcbd9d2f-b201-4410-b2fe-21979e74cd92@github.com> > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request incrementally with two additional commits since the last revision: - add new line for file - remove whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/3f1e41cb..ee2f5654 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=05-06 Stats: 6 lines in 6 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From duke at openjdk.java.net Tue Dec 7 06:21:54 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 7 Dec 2021 06:21:54 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v8] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8277930 - add new line for file - remove whitespace - remove event in metadata - add java event - Merge branch 'openjdk:master' into JDK-8277930 - remove whitespace - add free and Reallocate event - Merge branch 'openjdk:master' into JDK-8277930 - 8277930: Add unsafe allocation event to jfr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/ee2f5654..bf3a9c28 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=06-07 Stats: 10004 lines in 535 files changed: 6371 ins; 1500 del; 2133 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From duke at openjdk.java.net Tue Dec 7 07:29:17 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 7 Dec 2021 07:29:17 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v8] In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 06:21:54 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - add new line for file > - remove whitespace > - remove event in metadata > - add java event > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr code without unsafe event ---- c.unsafe.UnsafeTest.mallocAndFree 8 avgt 20 66.237 ? 0.672 ns/op c.unsafe.UnsafeTest.mallocAndFree 12 avgt 20 66.596 ? 0.286 ns/op c.unsafe.UnsafeTest.mallocAndFree 16 avgt 20 66.219 ? 0.621 ns/op c.unsafe.UnsafeTest.mallocAndFree 24 avgt 20 66.715 ? 0.329 ns/op c.unsafe.UnsafeTest.mallocAndFree 32 avgt 20 65.959 ? 0.714 ns/op c.unsafe.UnsafeTest.mallocAndFree 4096 avgt 20 71.858 ? 3.306 ns/op code with java event --- c.unsafe.UnsafeTest.mallocAndFree 8 avgt 20 66.705 ? 0.213 ns/op c.unsafe.UnsafeTest.mallocAndFree 12 avgt 20 66.621 ? 0.340 ns/op c.unsafe.UnsafeTest.mallocAndFree 16 avgt 20 66.693 ? 0.197 ns/op c.unsafe.UnsafeTest.mallocAndFree 24 avgt 20 66.039 ? 0.683 ns/op c.unsafe.UnsafeTest.mallocAndFree 32 avgt 20 66.035 ? 0.593 ns/op c.unsafe.UnsafeTest.mallocAndFree 4096 avgt 20 71.809 ? 3.535 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From egahlin at openjdk.java.net Tue Dec 7 08:16:18 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Tue, 7 Dec 2021 08:16:18 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v8] In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 06:21:54 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - add new line for file > - remove whitespace > - remove event in metadata > - add java event > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr Nice to see the runtime cost being eliminated when JFR is disabled. Each additional Java event adds startup cost, and we have been reluctant to add more JFR events, until we have a solution for this. It's not just these events, but others as well. My plan is to reduce the startup cost by eliminating the need for the handler classes and weave the bytecode at build time. The task is high priority as it blocks several enhancements. I've already done some prototyping, and plan is to have this in place for JDK 19, but no guarantees. There will be another mechanism than the Handlers class to avoid allocation. My suggestion is to revisit this PR when it has been checked in. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From sspitsyn at openjdk.java.net Tue Dec 7 08:20:17 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 7 Dec 2021 08:20:17 GMT Subject: RFR: 8272395: Bad HTML in JVMTI man page In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 00:27:45 GMT, Serguei Spitsyn wrote: > This fix adds escaping of invalid characters '[]' for 3 URLs defined in the jvmti.xml. > This file is a base to generate the jvmti.html. Thank you for review, David and Iris! ------------- PR: https://git.openjdk.java.net/jdk/pull/6730 From sspitsyn at openjdk.java.net Tue Dec 7 08:20:17 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 7 Dec 2021 08:20:17 GMT Subject: Integrated: 8272395: Bad HTML in JVMTI man page In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 00:27:45 GMT, Serguei Spitsyn wrote: > This fix adds escaping of invalid characters '[]' for 3 URLs defined in the jvmti.xml. > This file is a base to generate the jvmti.html. This pull request has now been integrated. Changeset: e535cb3f Author: Serguei Spitsyn URL: https://git.openjdk.java.net/jdk/commit/e535cb3fbac11785cfdb43c9b6f73b2a38a621d6 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8272395: Bad HTML in JVMTI man page Reviewed-by: dholmes, iris ------------- PR: https://git.openjdk.java.net/jdk/pull/6730 From stuefe at openjdk.java.net Tue Dec 7 09:07:33 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 7 Dec 2021 09:07:33 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state Message-ID: May I have reviews for this trivial fix. On Windows, we use `OSThread::_state` in `os::create_thread` before it has been initialized. This causes an assert to fire in `Thread::is_JavaThread_protected` (`assert(target->is_handshake_safe_for(current_thread)`) This only happens if the following is true: - We log os=info level, thereby firing the "Thread started.." log output the parent thread of a newly started child thread writes. Since JDK-8268773, we also print the thread name. `Thread::name()` uses `Thread::is_JavaThread_protected`, but on Windows, the thread state has not been set yet. - This is an assert, so only debug, but in debug newly malloced memory is poisoned with "F1F1F1F1...", which hides the error since `Thread::is_JavaThread_protected` compares the thread state like this: if (target->osthread() == NULL || target->osthread()->get_state() <= INITIALIZED) { return true; } and the compiler interprets the "F1F1F1F1" as a negative large value. Changing the init pattern to 0x01, or adding an explicit cast to unsigned, causes the assert to fire as soon as logging is switched on. --- Musings: I wondered whether we should change the thread state comparison to unsigned like this: if (target->osthread() == NULL || (unsigned)target->osthread()->get_state() <= INITIALIZED) { which would have shown the error right away after JDK-8268773. This is matter of taste though since one could say that at this point we expect the enum to be filled correctly with one of its values, and guarding against uninitialized memory should belong somewhere else, maybe in a debug-only verify function. -- Just my personal opinion, but `OSThread` could do with a bit of cleanup. E.g. using an initializer list to init its values, and possibly doing the per-platform-factoring out in a different way. That include-file-in-the-middle-of-class composition technique is terrible. It confuses IDEs and makes analyzing the code difficult, and the code is not really difficult in what it does. ---- Tests: GHAs (twice, once with experimentally set init word to 1 to see that the patch works) ------------- Commit messages: - initialize osthread state on windows Changes: https://git.openjdk.java.net/jdk/pull/6734/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6734&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278309 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6734.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6734/head:pull/6734 PR: https://git.openjdk.java.net/jdk/pull/6734 From dholmes at openjdk.java.net Tue Dec 7 09:39:11 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 7 Dec 2021 09:39:11 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 06:10:20 GMT, Thomas Stuefe wrote: > May I have reviews for this trivial fix. > > On Windows, we use `OSThread::_state` in `os::create_thread` before it has been initialized. This causes an assert to fire in `Thread::is_JavaThread_protected` (`assert(target->is_handshake_safe_for(current_thread)`) > > This only happens if the following is true: > - We log os=info level, thereby firing the "Thread started.." log output the parent thread of a newly started child thread writes. Since JDK-8268773, we also print the thread name. `Thread::name()` uses `Thread::is_JavaThread_protected`, but on Windows, the thread state has not been set yet. > - This is an assert, so only debug, but in debug newly malloced memory is poisoned with "F1F1F1F1...", which hides the error since `Thread::is_JavaThread_protected` compares the thread state like this: > > if (target->osthread() == NULL || target->osthread()->get_state() <= INITIALIZED) { > return true; > } > > and the compiler interprets the "F1F1F1F1" as a negative large value. Changing the init pattern to 0x01, or adding an explicit cast to unsigned, causes the assert to fire as soon as logging is switched on. > > --- > > Musings: > > I wondered whether we should change the thread state comparison to unsigned like this: > > if (target->osthread() == NULL || (unsigned)target->osthread()->get_state() <= INITIALIZED) { > > which would have shown the error right away after JDK-8268773. This is matter of taste though since one could say that at this point we expect the enum to be filled correctly with one of its values, and guarding against uninitialized memory should belong somewhere else, maybe in a debug-only verify function. > > -- > > Just my personal opinion, but `OSThread` could do with a bit of cleanup. E.g. using an initializer list to init its values, and possibly doing the per-platform-factoring out in a different way. That include-file-in-the-middle-of-class composition technique is terrible. It confuses IDEs and makes analyzing the code difficult, and the code is not really difficult in what it does. > > ---- > > Tests: GHAs (twice, once with experimentally set init word to 1 to see that the patch works) Looks good. :) Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6734 From stuefe at openjdk.java.net Tue Dec 7 09:52:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 7 Dec 2021 09:52:15 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 09:36:08 GMT, David Holmes wrote: > Looks good. :) > > Thanks, David Thank you David! ------------- PR: https://git.openjdk.java.net/jdk/pull/6734 From roland at openjdk.java.net Tue Dec 7 10:30:14 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Dec 2021 10:30:14 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. C2 platform independent code looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6444 From aph at openjdk.java.net Tue Dec 7 11:17:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 7 Dec 2021 11:17:11 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From shade at openjdk.java.net Tue Dec 7 11:32:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Dec 2021 11:32:16 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:26:04 GMT, Andrew Haley wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Fix a comment >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - More reviews >> - Review feedback >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Initial work: runs async-profiler successfully > > src/hotspot/cpu/zero/frame_zero.cpp line 139: > >> 137: assert(is_interpreted_frame(), "Not an interpreted frame"); >> 138: // These are reasonable sanity checks >> 139: if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) { > > Use `is_aligned()` here? Paging @theRealAph ;) ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From tschatzl at openjdk.java.net Tue Dec 7 12:04:17 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 7 Dec 2021 12:04:17 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v2] In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 17:31:39 GMT, Aleksey Shipilev wrote: >> Our support engineers asked this: >> >>> I see these G1Concurrent safepoints in JDK17: >>> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching >> safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >>> I've always thought that "concurrent" and "safepoint" are basically antonyms. >>> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? >> >> I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: >> >> >> [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms >> [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns >> [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms >> [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns >> [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Review Thomas > - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop > - Whitespace and touchups > - Basic implementation Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6677 From mdoerr at openjdk.java.net Tue Dec 7 12:47:48 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 7 Dec 2021 12:47:48 GMT Subject: RFR: 8253860: PPC: Relocation::pd_set_data_value conflates compressed oops and klasses [v2] In-Reply-To: References: Message-ID: > Casting narrow Klass pointer to a narrow Oop is problematic. Note that `NativeMovConstReg::set_narrow_oop` only supports narrow Oops. > It turns out, that the problematic code is unused. We never patch narrow Klass pointers in the instruction stream on PPC64 (`metadata_Relocation::pd_fix_value` has an empty implementation). In contrast to that, narrow Oops in the instructions stream always get patched when the nmethod gets installed (by `fix_oop_relocations`). This makes sense as Metadata doesn't get relocated, but Oops may be moved by GC and the instructions need to get the current value during nmethod installation. > Note that the initial constants we were using for narrow Oops in the instruction stream were not correct (Oop compression missing, not updated by GC). So, I think it's better to use 0 to avoid confusion. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add comment. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6716/files - new: https://git.openjdk.java.net/jdk/pull/6716/files/70e97ac7..02d03c06 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6716&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6716&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6716.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6716/head:pull/6716 PR: https://git.openjdk.java.net/jdk/pull/6716 From ddong at openjdk.java.net Tue Dec 7 12:51:16 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 7 Dec 2021 12:51:16 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 13:45:47 GMT, Andrew Haley wrote: >>> Thank you for this. I'll have a look. >>> >>> Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it >>> >>> It would be nice to get -XX:+PreserveFramePointer working correctly. >> >> Thanks for the response. >> >> I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: >> >> >> aarch64.ad >> >> aarch64_enc_java_to_runtime >> >> Label retaddr; >> __ adr(rscratch2, retaddr); >> __ lea(rscratch1, RuntimeAddress(entry)); >> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ add(sp, sp, 2 * wordSize); >> >> >> >> >> MacroAssembler::call_VM_leaf_base >> >> >> stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); >> >> mov(rscratch1, entry_point); >> blr(rscratch1); >> if (retaddr) >> bind(*retaddr); >> >> ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); >> >> >> >> I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. > >> > Thank you for this. I'll have a look. >> > Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it >> > It would be nice to get -XX:+PreserveFramePointer working correctly. >> >> Thanks for the response. >> >> I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: > > It's not reserving anything, it's saving the PC for the stack unwinder. > >> ``` >> aarch64.ad >> >> aarch64_enc_java_to_runtime >> >> Label retaddr; >> __ adr(rscratch2, retaddr); >> __ lea(rscratch1, RuntimeAddress(entry)); >> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ add(sp, sp, 2 * wordSize); >> ``` > > I wrote it. If you look at `JavaFrameAnchor::capture_last_Java_pc()` you'll see > it being used. > >> ``` >> MacroAssembler::call_VM_leaf_base >> >> >> stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); >> >> mov(rscratch1, entry_point); >> blr(rscratch1); >> if (retaddr) >> bind(*retaddr); >> >> ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); >> ``` >> >> I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. > > All this is doing is saving `rmethod` (which is in a call-clobbered register) around a VM call. `retaddr` is saved for OOP maps. Hi @theRealAph , Sorry to disturb you again, I have one more question. Under the current implementation, if the number of parameters of callee exceeds the number of parameter registers, the parameters on the stack cannot be read correctly, right? aarch64.ad aarch64_enc_java_to_runtime Label retaddr; __ adr(rscratch2, retaddr); __ lea(rscratch1, RuntimeAddress(entry)); // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); __ blr(rscratch1); __ bind(retaddr); __ add(sp, sp, 2 * wordSize); ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From shade at openjdk.java.net Tue Dec 7 13:25:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Dec 2021 13:25:16 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 06:10:20 GMT, Thomas Stuefe wrote: > May I have reviews for this trivial fix. > > On Windows, we use `OSThread::_state` in `os::create_thread` before it has been initialized. This causes an assert to fire in `Thread::is_JavaThread_protected` (`assert(target->is_handshake_safe_for(current_thread)`) > > This only happens if the following is true: > - We log os=info level, thereby firing the "Thread started.." log output the parent thread of a newly started child thread writes. Since JDK-8268773, we also print the thread name. `Thread::name()` uses `Thread::is_JavaThread_protected`, but on Windows, the thread state has not been set yet. > - This is an assert, so only debug, but in debug newly malloced memory is poisoned with "F1F1F1F1...", which hides the error since `Thread::is_JavaThread_protected` compares the thread state like this: > > if (target->osthread() == NULL || target->osthread()->get_state() <= INITIALIZED) { > return true; > } > > and the compiler interprets the "F1F1F1F1" as a negative large value. Changing the init pattern to 0x01, or adding an explicit cast to unsigned, causes the assert to fire as soon as logging is switched on. > > --- > > Musings: > > I wondered whether we should change the thread state comparison to unsigned like this: > > if (target->osthread() == NULL || (unsigned)target->osthread()->get_state() <= INITIALIZED) { > > which would have shown the error right away after JDK-8268773. This is matter of taste though since one could say that at this point we expect the enum to be filled correctly with one of its values, and guarding against uninitialized memory should belong somewhere else, maybe in a debug-only verify function. > > -- > > Just my personal opinion, but `OSThread` could do with a bit of cleanup. E.g. using an initializer list to init its values, and possibly doing the per-platform-factoring out in a different way. That include-file-in-the-middle-of-class composition technique is terrible. It confuses IDEs and makes analyzing the code difficult, and the code is not really difficult in what it does. > > ---- > > Tests: GHAs (twice, once with experimentally set init word to 1 to see that the patch works) This looks fine as the limited fix. But I do wonder if the initial state `ALLOCATED` should be stamped right in the `OSThread::OSThread` constructor. Also, I see other platforms, for example, Linux does: // set the correct thread state osthread->set_thread_type(thr_type); // Initial state is ALLOCATED but not INITIALIZED osthread->set_state(ALLOCATED); Not sure how safe it is to add `set_thread_type`, but matching the comment for `set_state` is probably in order. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6734 From stuefe at openjdk.java.net Tue Dec 7 13:34:16 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 7 Dec 2021 13:34:16 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state In-Reply-To: References: Message-ID: <0L92OzHnrR9WJgPsI6QzGn1RV3VKjrCpohf86NdpsZ0=.9136fdbf-ca6d-4b14-a2cd-1a9c2b1ab2b1@github.com> On Tue, 7 Dec 2021 13:22:27 GMT, Aleksey Shipilev wrote: > This looks fine as the limited fix. > > But I do wonder if the initial state `ALLOCATED` should be stamped right in the `OSThread::OSThread` constructor. Oh totally. This makes me itch. But I wanted a minimal patch to downport, and knowing me once I start touching OSThread I can't stop pulling threads. Ideally, I would like to get rid of OSThread altogether and merge it into Thread; there is no reason to have two physically separate structures like this. > Also, I see other platforms, for example, Linux does: > > ``` > // set the correct thread state > osthread->set_thread_type(thr_type); > > // Initial state is ALLOCATED but not INITIALIZED > osthread->set_state(ALLOCATED); > ``` > > Not sure how safe it is to add `set_thread_type`, but matching the comment for `set_state` is probably in order. Windows does not have a thread type. About the comment, sure, I can do that. Will do it before pushing. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6734 From stuefe at openjdk.java.net Tue Dec 7 13:39:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 7 Dec 2021 13:39:40 GMT Subject: RFR: JDK-8278309: [windows] use of uninitialized OSThread::_state [v2] In-Reply-To: References: Message-ID: <_BL3wZdGMwxkcAmG4-lEALk_QXfg6QSM4J3fSdGT6E8=.74855ee8-bb48-4742-85dd-16602611d28d@github.com> > May I have reviews for this trivial fix. > > On Windows, we use `OSThread::_state` in `os::create_thread` before it has been initialized. This causes an assert to fire in `Thread::is_JavaThread_protected` (`assert(target->is_handshake_safe_for(current_thread)`) > > This only happens if the following is true: > - We log os=info level, thereby firing the "Thread started.." log output the parent thread of a newly started child thread writes. Since JDK-8268773, we also print the thread name. `Thread::name()` uses `Thread::is_JavaThread_protected`, but on Windows, the thread state has not been set yet. > - This is an assert, so only debug, but in debug newly malloced memory is poisoned with "F1F1F1F1...", which hides the error since `Thread::is_JavaThread_protected` compares the thread state like this: > > if (target->osthread() == NULL || target->osthread()->get_state() <= INITIALIZED) { > return true; > } > > and the compiler interprets the "F1F1F1F1" as a negative large value. Changing the init pattern to 0x01, or adding an explicit cast to unsigned, causes the assert to fire as soon as logging is switched on. > > --- > > Musings: > > I wondered whether we should change the thread state comparison to unsigned like this: > > if (target->osthread() == NULL || (unsigned)target->osthread()->get_state() <= INITIALIZED) { > > which would have shown the error right away after JDK-8268773. This is matter of taste though since one could say that at this point we expect the enum to be filled correctly with one of its values, and guarding against uninitialized memory should belong somewhere else, maybe in a debug-only verify function. > > -- > > Just my personal opinion, but `OSThread` could do with a bit of cleanup. E.g. using an initializer list to init its values, and possibly doing the per-platform-factoring out in a different way. That include-file-in-the-middle-of-class composition technique is terrible. It confuses IDEs and makes analyzing the code difficult, and the code is not really difficult in what it does. > > ---- > > Tests: GHAs (twice, once with experimentally set init word to 1 to see that the patch works) Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: add comment mirroring linux version ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6734/files - new: https://git.openjdk.java.net/jdk/pull/6734/files/e85638e2..f7e72498 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6734&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6734&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6734.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6734/head:pull/6734 PR: https://git.openjdk.java.net/jdk/pull/6734 From phedlin at openjdk.java.net Tue Dec 7 13:44:21 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 7 Dec 2021 13:44:21 GMT Subject: Withdrawn: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 14:09:07 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation focusing on balance between small footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check. > > Testing: tier1-6 > > Benchmarks, 18-b26 vs. update (ran on Aurora/Ampere Altra): > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII..........72.23% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5...........70.38% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15....67.81% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16......... 3.72% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8..........68.50% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ASCII...........65.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:BIG5............60.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ISO_8859_15.....63.79% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_16.......... 1.04% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_8...........63.33% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ASCII............57.25% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:BIG5.............49.33% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ISO_8859_15......61.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_16........... 0.02% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_8............54.75% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII............54.52% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5.............40.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15......58.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16...........-0.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8............55.98% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII............47.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5.............36.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15......50.83% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16........... 8.63% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8............48.95% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII.............17.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5..............18.58% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15.......20.82% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16............ 4.16% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8.............18.44% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII.............21.96% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5..............22.42% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15.......30.27% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16............-1.17% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8.............35.99% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII............. 6.19% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5.............. 7.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15....... 8.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16............-0.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8............. 6.80% This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6723 From shade at openjdk.java.net Tue Dec 7 13:49:21 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Dec 2021 13:49:21 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> Message-ID: On Thu, 2 Dec 2021 18:23:07 GMT, Zhengyu Gu wrote: >> NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. >> >> After JDK-8277946, there is no longer a use, so it can be removed. >> >> Test: >> - [x] hotspot_nmt >> - [x] tier1 with NMT on > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > tstuefe's comments Very nice cleanup. Some minor suggestions below, feel free to ignore them. src/hotspot/share/services/mallocTracker.cpp line 263: > 261: > 262: #ifdef ASSERT > 263: if (level >= NMT_summary) { Suggestion: if (level > NMT_off) { This way, we don't care where `NMT_summary` is in the enum. We enter this code on all paths when NMT is enabled? src/hotspot/share/services/memTracker.cpp line 129: > 127: // recursive calls in case NMT reporting itself crashes. > 128: if (Atomic::cmpxchg(&g_final_report_did_run, false, true) == false) { > 129: if (enabled()) { Looks to me, you can do `enabled() && Atomic::cmpxchg(...)`, thus saving a tiny bit of CPU cycles when NMT is disabled? src/hotspot/share/services/memTracker.hpp line 141: > 139: > 140: static inline bool enabled() { > 141: return _tracking_level >= NMT_summary; Suggestion: return _tracking_level > NMT_off; Same reason as above. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6640 From shade at openjdk.java.net Tue Dec 7 13:49:23 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Dec 2021 13:49:23 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> Message-ID: On Tue, 7 Dec 2021 13:41:08 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> tstuefe's comments > > src/hotspot/share/services/mallocTracker.cpp line 263: > >> 261: >> 262: #ifdef ASSERT >> 263: if (level >= NMT_summary) { > > Suggestion: > > if (level > NMT_off) { > > > This way, we don't care where `NMT_summary` is in the enum. We enter this code on all paths when NMT is enabled? Actually, maybe even call to `enabled()` here? ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From stuefe at openjdk.java.net Tue Dec 7 13:53:17 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 7 Dec 2021 13:53:17 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v2] In-Reply-To: <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> <_sVZGs-41bTQMUdjZMliWfhr9E4UopWGDlZONZ3-Ulc=.17dd56b1-a2fe-4790-abe0-3b15924b8b9f@github.com> Message-ID: <3NLbmegsJ8GvH8dUu8fhAKgiFrlKpL3hYfnP7wQt8uo=.7162868c-1612-4533-b9b8-cecf877a9358@github.com> On Thu, 2 Dec 2021 18:23:07 GMT, Zhengyu Gu wrote: >> NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. >> >> After JDK-8277946, there is no longer a use, so it can be removed. >> >> Test: >> - [x] hotspot_nmt >> - [x] tier1 with NMT on > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > tstuefe's comments Let's ship this, I'm really interested in getting this into JDK 18 before ramp-down on Thursday. ------------- PR: https://git.openjdk.java.net/jdk/pull/6640 From zgu at openjdk.java.net Tue Dec 7 14:23:48 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 7 Dec 2021 14:23:48 GMT Subject: RFR: 8277990: NMT: Remove NMT shutdown capability [v3] In-Reply-To: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> References: <6wXhKaVBg6nKnJZDO0xCdBlQXMuo0jhEeAXeIo7ILns=.71dd69d2-e84e-4b24-9815-c98d32808b2b@github.com> Message-ID: > NMT shutdown functionality is a remnant of its first implementation, which could consume excessive amount of memory, therefore, it needed capability to shut it self down to ensure health of JVM. This is no longer a case for current implementation. > > After JDK-8277946, there is no longer a use, so it can be removed. > > Test: > - [x] hotspot_nmt > - [x] tier1 with NMT on Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6640/files - new: https://git.openjdk.java.net/jdk/pull/6640/files/58d22421..30aa1a52 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6640&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6640&range=01-02 Stats: 6 lines in 3 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6640.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6640/head:pull/6640 PR: https://git.openjdk.java.net/jdk/pull/6640 From mdoerr at openjdk.java.net Thu Dec 9 17:09:21 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 9 Dec 2021 17:09:21 GMT Subject: Integrated: 8253860: PPC: Relocation::pd_set_data_value conflates compressed oops and klasses In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 09:29:45 GMT, Martin Doerr wrote: > Casting narrow Klass pointer to a narrow Oop is problematic. Note that `NativeMovConstReg::set_narrow_oop` only supports narrow Oops. > It turns out, that the problematic code is unused. We never patch narrow Klass pointers in the instruction stream on PPC64 (`metadata_Relocation::pd_fix_value` has an empty implementation). In contrast to that, narrow Oops in the instructions stream always get patched when the nmethod gets installed (by `fix_oop_relocations`). This makes sense as Metadata doesn't get relocated, but Oops may be moved by GC and the instructions need to get the current value during nmethod installation. > Note that the initial constants we were using for narrow Oops in the instruction stream were not correct (Oop compression missing, not updated by GC). So, I think it's better to use 0 to avoid confusion. This pull request has now been integrated. Changeset: 01b30bfa Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/01b30bfa99e95cf1e9209c8de1f3c3c762596708 Stats: 39 lines in 5 files changed: 1 ins; 24 del; 14 mod 8253860: PPC: Relocation::pd_set_data_value conflates compressed oops and klasses Reviewed-by: dlong, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/6716 From aph at openjdk.java.net Thu Dec 9 17:16:20 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 9 Dec 2021 17:16:20 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v8] In-Reply-To: References: Message-ID: On Thu, 2 Dec 2021 09:20:53 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Fix up UseROPProtection flag make/autoconf/flags-cflags.m4 line 902: > 900: BRANCH_PROTECTION_CFLAGS="" > 901: UTIL_ARG_ENABLE(NAME: branch-protection, DEFAULT: auto, > 902: RESULT: USE_BRANCH_PROTECTION, AVAILABLE: $BRANCH_PROTECTION_AVAILABLE, What exactly is going on here? Is it that if the host compiler has "branch protection" supported, we will build with it on by default? if so, no, we don't want to do that. For now, branch protection should be an explicit opt in. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Dec 9 23:43:35 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 9 Dec 2021 23:43:35 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 Message-ID: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. I also added a test case to detect this overrun. ------------- Commit messages: - Add buffer overrun check for decode - Add masked write Changes: https://git.openjdk.java.net/jdk/pull/6786/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6786&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273108 Stats: 12 lines in 2 files changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6786.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6786/head:pull/6786 PR: https://git.openjdk.java.net/jdk/pull/6786 From sviswanathan at openjdk.java.net Thu Dec 9 23:43:38 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Dec 2021 23:43:38 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6264: > 6262: __ jcc(Assembler::lessEqual, L_finalBit); > 6263: > 6264: __ mov64(rax, 0x0000ffffffffffff); The constant should have an l suffix. ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Thu Dec 9 23:54:12 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Thu, 9 Dec 2021 23:54:12 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 23:10:00 GMT, Sandhya Viswanathan wrote: >> The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. >> >> I also added a test case to detect this overrun. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6264: > >> 6262: __ jcc(Assembler::lessEqual, L_finalBit); >> 6263: >> 6264: __ mov64(rax, 0x0000ffffffffffff); > > The constant should have an l suffix. I do not believe this is necessary. There are multiple occurrences of mov64()s without the `l` suffix. For example, lines 687-688: __ mov64(c_rarg3, 0x8000000000000000); __ mov64(rax, 0x7fffffffffffffff); ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From sviswanathan at openjdk.java.net Fri Dec 10 00:00:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 10 Dec 2021 00:00:17 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 23:50:52 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6264: >> >>> 6262: __ jcc(Assembler::lessEqual, L_finalBit); >>> 6263: >>> 6264: __ mov64(rax, 0x0000ffffffffffff); >> >> The constant should have an l suffix. > > I do not believe this is necessary. There are multiple occurrences of mov64()s without the `l` suffix. For example, lines 687-688: > > __ mov64(c_rarg3, 0x8000000000000000); > __ mov64(rax, 0x7fffffffffffffff); You are right, the code looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From sviswanathan at openjdk.java.net Fri Dec 10 00:04:12 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 10 Dec 2021 00:04:12 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. @asgibbons The change looks good to me. Could you please create this PR versus JDK 18 (https://github.com/openjdk/jdk18). ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From kvn at openjdk.java.net Fri Dec 10 00:11:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 10 Dec 2021 00:11:12 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. Yes, new PR have to be filed based on jdk18 repo pointed by Sandhya because we need to fix it in JDK 18. After integration the fix will be automatically pushed into JDK 19 (current repo). ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Fri Dec 10 00:23:14 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Fri, 10 Dec 2021 00:23:14 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: <3vmsCM9a-EVn58115yZR6TU4yKCDp_XmfUr1r9F38T4=.112cf3ad-3968-43e4-ad3e-d7b97af6438f@github.com> On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. I just created a PR (https://github.com/openjdk/jdk18/pull/4) on the jdk-18 branch. Thanks for the heads-up, ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Fri Dec 10 00:34:55 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Fri, 10 Dec 2021 00:34:55 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 Message-ID: The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. I also added a test case to detect this overrun. ------------- Commit messages: - Apply Base64 buffer overrun fix to JDK 18 Changes: https://git.openjdk.java.net/jdk18/pull/4/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=4&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273108 Stats: 12 lines in 2 files changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk18/pull/4.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/4/head:pull/4 PR: https://git.openjdk.java.net/jdk18/pull/4 From kbarrett at openjdk.java.net Fri Dec 10 00:41:19 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 10 Dec 2021 00:41:19 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v2] In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 17:31:39 GMT, Aleksey Shipilev wrote: >> Our support engineers asked this: >> >>> I see these G1Concurrent safepoints in JDK17: >>> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching >> safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >>> I've always thought that "concurrent" and "safepoint" are basically antonyms. >>> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? >> >> I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: >> >> >> [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms >> [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns >> [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms >> [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns >> [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Review Thomas > - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop > - Whitespace and touchups > - Basic implementation Looks good. Regarding suggestions to use `override`, I don't need a re-review if you make those changes. src/hotspot/share/gc/g1/g1VMOperations.hpp line 102: > 100: virtual bool doit_prologue(); > 101: virtual void doit_epilogue(); > 102: virtual void doit(); Consider changing all of these to use `override`. src/hotspot/share/gc/g1/g1VMOperations.hpp line 109: > 107: VM_G1PauseRemark() : VM_G1PauseConcurrent("Pause Remark") { } > 108: virtual VMOp_Type type() const { return VMOp_G1PauseRemark; } > 109: virtual void work(); Consider changing these to use `override`. src/hotspot/share/gc/g1/g1VMOperations.hpp line 116: > 114: VM_G1PauseCleanup() : VM_G1PauseConcurrent("Pause Cleanup") { } > 115: virtual VMOp_Type type() const { return VMOp_G1PauseCleanup; } > 116: virtual void work(); Consider changing these to use `override`. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6677 From mli at openjdk.java.net Fri Dec 10 01:08:14 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 10 Dec 2021 01:08:14 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: <4j6pdqEUewT8uil3vG0Oo8q-7n5HPb9zVLAzEVdix_4=.a673e2ad-ee74-407a-a53e-94f25e8769a0@github.com> On Wed, 8 Dec 2021 11:30:45 GMT, Hamlin Li wrote: > This is to get information about the pause time distribution (prepare(copy, sorting, ?) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in ?Remove Self Forwards?, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future. > > > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... Seems the jdk mail service was done, ping ~ ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From sviswanathan at openjdk.java.net Fri Dec 10 01:21:36 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 10 Dec 2021 01:21:36 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/4 From mli at openjdk.java.net Fri Dec 10 02:20:14 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 10 Dec 2021 02:20:14 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: On Wed, 8 Dec 2021 11:30:45 GMT, Hamlin Li wrote: > This is to get information about the pause time distribution (prepare(copy, sorting, ?) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in ?Remove Self Forwards?, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future. > > > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... ping again ~ test mail service ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From dlong at openjdk.java.net Fri Dec 10 03:06:11 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 10 Dec 2021 03:06:11 GMT Subject: RFR: 8262134: compiler/uncommontrap/TestDeoptOOM.java failed with "guarantee(false) failed: wrong number of expression stack elements during deopt" In-Reply-To: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> References: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> Message-ID: <064CCgMDxJfl1j1YM5BArePtzyhX4ce6obFKBvt9FcQ=.7445baa6-a0dc-4c23-9840-006dcb22d2c1@github.com> On Wed, 8 Dec 2021 23:49:25 GMT, Dean Long wrote: > C1 patching stubs use the Unpack_reexecute deoptimization type, but if an exception is thrown, that information is lost, causing the VerifyStack logic to fail. Rather than relaxing the VerifyStack logic to accept Unpack_exception in this case, this change sets the reexecute flag on the patch stub call site. I guess I'll need to retarget this for 18. ------------- PR: https://git.openjdk.java.net/jdk/pull/6776 From dlong at openjdk.java.net Fri Dec 10 03:21:16 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 10 Dec 2021 03:21:16 GMT Subject: Withdrawn: 8262134: compiler/uncommontrap/TestDeoptOOM.java failed with "guarantee(false) failed: wrong number of expression stack elements during deopt" In-Reply-To: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> References: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> Message-ID: On Wed, 8 Dec 2021 23:49:25 GMT, Dean Long wrote: > C1 patching stubs use the Unpack_reexecute deoptimization type, but if an exception is thrown, that information is lost, causing the VerifyStack logic to fail. Rather than relaxing the VerifyStack logic to accept Unpack_exception in this case, this change sets the reexecute flag on the patch stub call site. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6776 From kvn at openjdk.java.net Fri Dec 10 04:00:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 10 Dec 2021 04:00:14 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. Let me test it before approval. ------------- PR: https://git.openjdk.java.net/jdk18/pull/4 From sjohanss at openjdk.java.net Fri Dec 10 09:46:11 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 10 Dec 2021 09:46:11 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> On Wed, 8 Dec 2021 11:30:45 GMT, Hamlin Li wrote: > This is to get information about the pause time distribution (prepare(copy, sorting, ?) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in ?Remove Self Forwards?, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future. > > > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... Haven't looked in detail at the patch yet, but I have a comment on the output. > ``` > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... > ``` I thought we agreed on having more things on debug level and using "Evacuation Failure Regions" for now (until we have more than one type of retained regions), like this: [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... [10.917s][debug][gc,phases] GC(0) Evacuation Failed Regions: ... [10.917s][debug][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... [10.917s][debug][gc,phases] GC(0) Reformat Retained Regions (ms): ... [10.917s][debug][gc,phases] GC(0) Retained Objects: ... [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... [10.917s][debug][gc,phases] GC(0) Reclaim Memory (ms): ... [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... Possibly also put the regions count last, to keep the timings at the top. ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From mli at openjdk.java.net Fri Dec 10 10:45:16 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 10 Dec 2021 10:45:16 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> References: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> Message-ID: On Fri, 10 Dec 2021 09:42:47 GMT, Stefan Johansson wrote: > I thought we agreed on having more things on debug level and using "Evacuation Failure Regions" for now (until we have more than one type of retained regions), like this: Thanks Stefan. I misunderstood, will update it as "Evacuation Failure Regions". > > ``` > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Evacuation Failed Regions: ... > [10.917s][debug][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][debug][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][debug][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... > ``` > Currently, I implement it by add following phases: RestoreRetainedRegions, RestoreRetainedRegionsPrepare, RestoreRetainedRegionsReformat, RestoreRetainedRegionsReclaim, and, add `RestoreRetainedRegionsObjects` and `RestoreRetainedRegionsBytes` as work item of `RestoreRetainedRegionsReformat` phase, add `RestoreRetainedRegionsReclaimUsedMemory` as work item of `RestoreRetainedRegionsReclaim`, add `RestoreRetainedRegionsNum` as work item of `RestoreRetainedRegions` phase. I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation. > Possibly also put the regions count last, to keep the timings at the top. I tried this too. But seems the current phase time framework only support to put the WorkItems of a phase close to its phase, rather than separate them with other phases, I will investigate it further. ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From sjohanss at openjdk.java.net Fri Dec 10 11:03:13 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 10 Dec 2021 11:03:13 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> Message-ID: On Fri, 10 Dec 2021 10:41:42 GMT, Hamlin Li wrote: > I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation. I think you are correct, maybe Thomas know any trick. Otherwise I guess we could separate the "counts" from the timing, but that is not optimal either. If not solvable in a good way, maybe we should just drop some of the trace counts. ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From duke at openjdk.java.net Fri Dec 10 11:06:17 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 10 Dec 2021 11:06:17 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v8] In-Reply-To: References: Message-ID: On Thu, 9 Dec 2021 17:12:45 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix up UseROPProtection flag > > make/autoconf/flags-cflags.m4 line 902: > >> 900: BRANCH_PROTECTION_CFLAGS="" >> 901: UTIL_ARG_ENABLE(NAME: branch-protection, DEFAULT: auto, >> 902: RESULT: USE_BRANCH_PROTECTION, AVAILABLE: $BRANCH_PROTECTION_AVAILABLE, > > What exactly is going on here? Is it that if the host compiler has "branch protection" supported, we will build with it on by default? if so, no, we don't want to do that. For now, branch protection should be an explicit opt in. The idea was that the same JDK could be used on PAC and non PAC systems. I agree that right now it's probably better being explicit opt in. Will update. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Fri Dec 10 12:39:50 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 10 Dec 2021 12:39:50 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v9] In-Reply-To: References: Message-ID: <8qhvLwNTzv5KxwJo93xrYA3GQSAX9NJm24EmbqFc3l8=.ba92bad8-0983-4519-9255-6913569f2638@github.com> > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Default to building without branch-protection ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/995d8aa3..38c08ef5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From tschatzl at openjdk.java.net Fri Dec 10 12:43:15 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 10 Dec 2021 12:43:15 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> Message-ID: <-v00pdoQ_VAQG64AReMpSBIWPQcgyjS8p3cXDelCCiM=.6d0c0c5e-901a-41b4-a68d-9ade9e10bdb2@github.com> On Fri, 10 Dec 2021 11:00:31 GMT, Stefan Johansson wrote: > > I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation. > > I think you are correct, maybe Thomas know any trick. Otherwise I guess we could separate the "counts" from the timing, but that is not optimal either. If not solvable in a good way, maybe we should just drop some of the trace counts. There is no way to have different levels for the work items than for the timing it is attached to. I overlooked that when proposing this layout. I would like to just keep things the way they are though then, except moving the `Evacuation Failure Regions` to the bottom, as imho the timing is more interesting typically (or in other words, I'm typically only looking at the timings and if they are not as expected, I am digging into more details with the counts). Making the work items having a different log level than the master item could be done later. One remark: that `[Native]` in `Used [Native] Memory` has been meant as optional. :) I would just remove the `[Native]`... (did not look at the code yet). ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From aph at openjdk.java.net Fri Dec 10 13:25:21 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 10 Dec 2021 13:25:21 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v9] In-Reply-To: <8qhvLwNTzv5KxwJo93xrYA3GQSAX9NJm24EmbqFc3l8=.ba92bad8-0983-4519-9255-6913569f2638@github.com> References: <8qhvLwNTzv5KxwJo93xrYA3GQSAX9NJm24EmbqFc3l8=.ba92bad8-0983-4519-9255-6913569f2638@github.com> Message-ID: <5g4s-czewXTVHX027JYGJIXapsXAjGYmScabO9Nk8nA=.6bc890fd-9394-4b77-9c87-890c8364d980@github.com> On Fri, 10 Dec 2021 12:39:50 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Default to building without branch-protection src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 419: > 417: if (UseROPProtection) { > 418: warning("UseROPProtection specified, but not supported on this CPU."); > 419: FLAG_SET_DEFAULT(UseROPProtection, false); Suggestion: FLAG_SET_DEFAULT(UseROPProtection, true); Given that the instructions used are in NOP space, this won't do any harm, and it will allow developers without PAC-enabled systems to see what code PAC would generate. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From mli at openjdk.java.net Fri Dec 10 14:52:14 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 10 Dec 2021 14:52:14 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: <-v00pdoQ_VAQG64AReMpSBIWPQcgyjS8p3cXDelCCiM=.6d0c0c5e-901a-41b4-a68d-9ade9e10bdb2@github.com> References: <4vLuo83_cTUXs2z0L1k6vcvlkVgHD52MIGJqt4owApQ=.081c659e-b3dc-410e-be9c-64592f087bf2@github.com> <-v00pdoQ_VAQG64AReMpSBIWPQcgyjS8p3cXDelCCiM=.6d0c0c5e-901a-41b4-a68d-9ade9e10bdb2@github.com> Message-ID: <1iQcFubcuobHaMRapVxkdGHeh5piqZ1rZcJ-WPpvDTo=.46d6436b-21b5-486f-a707-d6b1b41d063e@github.com> On Fri, 10 Dec 2021 12:39:43 GMT, Thomas Schatzl wrote: > There is no way to have different levels for the work items than for the timing it is attached to. I overlooked that when proposing this layout. I would like to just keep things the way they are though then, except moving the `Evacuation Failure Regions` to the bottom, as imho the timing is more interesting typically (or in other words, I'm typically only looking at the timings and if they are not as expected, I am digging into more details with the counts). Not sure if I missed something in the code, but seems if "Evacuation Failed Regions" is attached to "Restore Retained Regions (ms)", then there is no way to put other phases between them. > Making the work items having a different log level than the master item could be done later. Agree. > > One remark: that `[Native]` in `Used [Native] Memory` has been meant as optional. :) I would just remove the `[Native]`... (did not look at the code yet). Sure, I will update it as "Used Memory". :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From hseigel at openjdk.java.net Fri Dec 10 15:11:36 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 10 Dec 2021 15:11:36 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags Message-ID: Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. Thanks! Harold ------------- Commit messages: - fix typo - 8277481: Obsolete seldom used CDS flags Changes: https://git.openjdk.java.net/jdk/pull/6800/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6800&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277481 Stats: 151 lines in 13 files changed: 22 ins; 94 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/6800.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6800/head:pull/6800 PR: https://git.openjdk.java.net/jdk/pull/6800 From duke at openjdk.java.net Fri Dec 10 15:14:47 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 10 Dec 2021 15:14:47 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Remove BSD/Apple specific code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/38c08ef5..63f7515f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=08-09 Stats: 55 lines in 1 file changed: 0 ins; 51 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Fri Dec 10 15:19:21 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 10 Dec 2021 15:19:21 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v9] In-Reply-To: <5g4s-czewXTVHX027JYGJIXapsXAjGYmScabO9Nk8nA=.6bc890fd-9394-4b77-9c87-890c8364d980@github.com> References: <8qhvLwNTzv5KxwJo93xrYA3GQSAX9NJm24EmbqFc3l8=.ba92bad8-0983-4519-9255-6913569f2638@github.com> <5g4s-czewXTVHX027JYGJIXapsXAjGYmScabO9Nk8nA=.6bc890fd-9394-4b77-9c87-890c8364d980@github.com> Message-ID: <1O5M3usjaNAhxthALcIb-fLeJUMrNiLc9OQ5nrlXMkg=.d7c5dc66-61b9-4fb6-813e-e74f9d536baf@github.com> On Fri, 10 Dec 2021 13:21:46 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Default to building without branch-protection > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 419: > >> 417: if (UseROPProtection) { >> 418: warning("UseROPProtection specified, but not supported on this CPU."); >> 419: FLAG_SET_DEFAULT(UseROPProtection, false); > > Suggestion: > > FLAG_SET_DEFAULT(UseROPProtection, true); > > Given that the instructions used are in NOP space, this won't do any harm, and it will allow developers without PAC-enabled systems to see what code PAC would generate. Ok, I think that's fine. How about on a non pac system allowing it for development only ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From iveresov at openjdk.java.net Fri Dec 10 17:11:11 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 10 Dec 2021 17:11:11 GMT Subject: RFR: 8262134: compiler/uncommontrap/TestDeoptOOM.java failed with "guarantee(false) failed: wrong number of expression stack elements during deopt" In-Reply-To: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> References: <5pD6SNXU2pDZmAj-XhdOCmi7Fxj1K9vxeQqJxcRpBvc=.4efd3bf3-5286-49d1-9e0f-9946c03223fd@github.com> Message-ID: On Wed, 8 Dec 2021 23:49:25 GMT, Dean Long wrote: > C1 patching stubs use the Unpack_reexecute deoptimization type, but if an exception is thrown, that information is lost, causing the VerifyStack logic to fail. Rather than relaxing the VerifyStack logic to accept Unpack_exception in this case, this change sets the reexecute flag on the patch stub call site. Have you run tier7 where is it does -XX:DeoptimizeALot ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6776 From jwilhelm at openjdk.java.net Fri Dec 10 18:13:18 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 10 Dec 2021 18:13:18 GMT Subject: RFR: Merge jdk18 Message-ID: Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8277621: ARM32: multiple fastdebug failures with "bad AD file" after JDK-8276162 - 8278538: Test langtools/jdk/javadoc/tool/CheckManPageOptions.java fails after the manpage was updated - 8273179: Update nroff pages in JDK 18 before RC The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.java.net/jdk/pull/6802/files Stats: 1142 lines in 30 files changed: 688 ins; 155 del; 299 mod Patch: https://git.openjdk.java.net/jdk/pull/6802.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6802/head:pull/6802 PR: https://git.openjdk.java.net/jdk/pull/6802 From jwilhelm at openjdk.java.net Fri Dec 10 18:46:17 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 10 Dec 2021 18:46:17 GMT Subject: Integrated: Merge jdk18 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 17:51:31 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: 61736f81 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/61736f81fb4a20375c83d59e2b37a00aafb11107 Stats: 1142 lines in 30 files changed: 688 ins; 155 del; 299 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6802 From kcr at openjdk.java.net Fri Dec 10 18:48:22 2021 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Fri, 10 Dec 2021 18:48:22 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. @asgibbons I see that [JDK-8275427](https://bugs.openjdk.java.net/browse/JDK-8275427) is closed as a duplicate. Normally, duplicates are not listed in the commit message of a fix. ------------- PR: https://git.openjdk.java.net/jdk18/pull/4 From duke at openjdk.java.net Fri Dec 10 18:55:17 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Fri, 10 Dec 2021 18:55:17 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 18:45:02 GMT, Kevin Rushforth wrote: >> The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. >> >> I also added a test case to detect this overrun. > > @asgibbons I see that [JDK-8275427](https://bugs.openjdk.java.net/browse/JDK-8275427) is closed as a duplicate. Normally, duplicates are not listed in the commit message of a fix. @kevinrushforth Thanks for the tip. I believe it was marked as duplicate after I made this PR. I'll keep this in mind for future PRs. ------------- PR: https://git.openjdk.java.net/jdk18/pull/4 From iklam at openjdk.java.net Fri Dec 10 19:10:15 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 10 Dec 2021 19:10:15 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 15:01:29 GMT, Harold Seigel wrote: > Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. > > The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. > > Thanks! Harold Looks good to me. Thanks for fixing this! ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6800 From ccheung at openjdk.java.net Fri Dec 10 19:41:16 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 10 Dec 2021 19:41:16 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 15:01:29 GMT, Harold Seigel wrote: > Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. > > The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. > > Thanks! Harold Looks good. Just one nit. src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c line 303: > 301: useSharedSpacesAddr = lookup_symbol(ph, jvm_name, USE_SHARED_SPACES_SYM); > 302: if (useSharedSpacesAddr == 0) { > 303: print_debug("can't lookup 'UseSharedSpaces' symbol\n"); Maybe the `print_debug` at line 311 should also be updated from "flag" to "symbol"? ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6800 From hseigel at openjdk.java.net Fri Dec 10 19:49:48 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 10 Dec 2021 19:49:48 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags [v2] In-Reply-To: References: Message-ID: > Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. > > The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. > > Thanks! Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: fix print_debug() message ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6800/files - new: https://git.openjdk.java.net/jdk/pull/6800/files/3f6c6dee..601a678f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6800&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6800&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6800.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6800/head:pull/6800 PR: https://git.openjdk.java.net/jdk/pull/6800 From hseigel at openjdk.java.net Fri Dec 10 19:49:51 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 10 Dec 2021 19:49:51 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags [v2] In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 19:37:31 GMT, Calvin Cheung wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> fix print_debug() message > > src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c line 303: > >> 301: useSharedSpacesAddr = lookup_symbol(ph, jvm_name, USE_SHARED_SPACES_SYM); >> 302: if (useSharedSpacesAddr == 0) { >> 303: print_debug("can't lookup 'UseSharedSpaces' symbol\n"); > > Maybe the `print_debug` at line 311 should also be updated from "flag" to "symbol"? fixed. Thanks for pointing it out. ------------- PR: https://git.openjdk.java.net/jdk/pull/6800 From duke at openjdk.java.net Fri Dec 10 21:14:36 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 10 Dec 2021 21:14:36 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 Message-ID: This JVM SpinPause can use different implementations of spin pauses. It uses `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides an instruction runner together with the description of the instruction and the instruction count. It can be used at places where generation of spin pauses is not possible, like in runtime SpinPause function. The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. However JVM SpinPause might need less instructions than the intrinsic. To support such cases the instruction runner interface supports the `count` parameter. Testing results for fastdebug and release builds: - `gtest`: Passed - `tier1`...`tier4`: Passed - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: +-----------+-------------------+------------+-----------+-----------+----------+---------+ | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | +-----------+-------------------+------------+-----------+-----------+----------+---------+ | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | +-----------+-------------------+------------+-----------+-----------+----------+---------+ ------------- Commit messages: - 8278241: Implement JVM SpinPause on linux-aarch64 Changes: https://git.openjdk.java.net/jdk/pull/6803/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278241 Stats: 138 lines in 5 files changed: 130 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/6803.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6803/head:pull/6803 PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Fri Dec 10 21:14:36 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 10 Dec 2021 21:14:36 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 21:02:53 GMT, Evgeny Astigeevich wrote: > This JVM SpinPause can use different implementations of spin pauses. It uses `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides an instruction runner together with the description of the instruction and the instruction count. It can be used at places where generation of spin pauses is not possible, like in runtime SpinPause function. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. However JVM SpinPause might need less instructions than the intrinsic. To support such cases the instruction runner interface supports the `count` parameter. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ @nick-arm @theRealAph @stooart-mon Hi, could you have a look please? ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From kvn at openjdk.java.net Fri Dec 10 21:24:13 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 10 Dec 2021 21:24:13 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. You should closed other bugs as duplicate if you think the fix applied to them. Also you don't need to list them in PR because they are listed in JBS anyway. Testing takes long time because, as test's name says, it runs for 24 hours. I want to make sure test passed with this fix. ------------- PR: https://git.openjdk.java.net/jdk18/pull/4 From iklam at openjdk.java.net Sat Dec 11 01:55:50 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 11 Dec 2021 01:55:50 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v3] In-Reply-To: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: > **Background:** > > In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: > > > public enum Day { SUNDAY, MONDAY ... } > > > to > > > public class Day extends java.lang.Enum { > public static final SUNDAY = new Day("SUNDAY"); > public static final MONDAY = new Day("MONDAY"); ... > } > > > With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) > > **Fix:** > > During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. > > This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. > > **Verification:** > > To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt > > **Testing:** > > Passed Oracle CI tiers 1-4. WIll run tier 5 as well. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into 8275731-heapshared-enum - added exclusions needed by "java -Xshare:dump -ea -esa" - Comments from @calvinccheung off-line - 8275731: CDS archived enums objects are recreated at runtime ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6653/files - new: https://git.openjdk.java.net/jdk/pull/6653/files/df0d3f88..6e160057 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=01-02 Stats: 24204 lines in 951 files changed: 16523 ins; 3176 del; 4505 mod Patch: https://git.openjdk.java.net/jdk/pull/6653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6653/head:pull/6653 PR: https://git.openjdk.java.net/jdk/pull/6653 From aph at openjdk.java.net Sat Dec 11 09:29:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 11 Dec 2021 09:29:09 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 21:07:13 GMT, Evgeny Astigeevich wrote: > @nick-arm @theRealAph @stooart-mon Hi, could you have a look please? This is way too complicated. I'd use `MacroAssembler::spin_wait()` to generate a stub and call it from `SpinPause`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From aph at openjdk.java.net Sat Dec 11 09:33:12 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 11 Dec 2021 09:33:12 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v9] In-Reply-To: <1O5M3usjaNAhxthALcIb-fLeJUMrNiLc9OQ5nrlXMkg=.d7c5dc66-61b9-4fb6-813e-e74f9d536baf@github.com> References: <8qhvLwNTzv5KxwJo93xrYA3GQSAX9NJm24EmbqFc3l8=.ba92bad8-0983-4519-9255-6913569f2638@github.com> <5g4s-czewXTVHX027JYGJIXapsXAjGYmScabO9Nk8nA=.6bc890fd-9394-4b77-9c87-890c8364d980@github.com> <1O5M3usjaNAhxthALcIb-fLeJUMrNiLc9OQ5nrlXMkg=.d7c5dc66-61b9-4fb6-813e-e74f9d536baf@github.com> Message-ID: On Fri, 10 Dec 2021 15:16:19 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 419: >> >>> 417: if (UseROPProtection) { >>> 418: warning("UseROPProtection specified, but not supported on this CPU."); >>> 419: FLAG_SET_DEFAULT(UseROPProtection, false); >> >> Suggestion: >> >> FLAG_SET_DEFAULT(UseROPProtection, true); >> >> Given that the instructions used are in NOP space, this won't do any harm, and it will allow developers without PAC-enabled systems to see what code PAC would generate. > > Ok, I think that's fine. How about on a non pac system allowing it for development only ? Maybe. Mind you, a lot of the time I'm looking at the output from production systems. >From a rather philosophical point of view, I assume that if the user of a computer asks for something that isn't going to break anything or confuse anyone, we should honour their request. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Sat Dec 11 14:08:12 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 11 Dec 2021 14:08:12 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 15:14:47 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Remove BSD/Apple specific code src/hotspot/cpu/aarch64/globals_aarch64.hpp line 122: > 120: "It cannot be used with OnSpinWaitInst=none.") \ > 121: range(1, 99) \ > 122: product(bool, UseROPProtection, false, \ Question: this is called "UseROPProtection", the configure option is called "enable-branch-protection", and GCC option is called "-mbranch-protection". This is confusing. I would have thought we would want the same name, and use it for all branch protection. So why is this not "UseBranchProtection"? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From fweimer at openjdk.java.net Sat Dec 11 15:42:13 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Sat, 11 Dec 2021 15:42:13 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: On Sat, 11 Dec 2021 14:05:12 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove BSD/Apple specific code > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 122: > >> 120: "It cannot be used with OnSpinWaitInst=none.") \ >> 121: range(1, 99) \ >> 122: product(bool, UseROPProtection, false, \ > > Question: this is called "UseROPProtection", the configure option is called "enable-branch-protection", and GCC option is called "-mbranch-protection". This is confusing. I would have thought we would want the same name, and use it for all branch protection. So why is this not "UseBranchProtection"? `-mbranch-protection` switches on both PAC-RET and BTI. This PR only covers a use of PAC that looks very ROP-focused to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Sun Dec 12 10:22:16 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 12 Dec 2021 10:22:16 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: On Sat, 11 Dec 2021 15:39:24 GMT, Florian Weimer wrote: >> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 122: >> >>> 120: "It cannot be used with OnSpinWaitInst=none.") \ >>> 121: range(1, 99) \ >>> 122: product(bool, UseROPProtection, false, \ >> >> Question: this is called "UseROPProtection", the configure option is called "enable-branch-protection", and GCC option is called "-mbranch-protection". This is confusing. I would have thought we would want the same name, and use it for all branch protection. So why is this not "UseBranchProtection"? > > `-mbranch-protection` switches on both PAC-RET and BTI. This PR only covers a use of PAC that looks very ROP-focused to me. True, because we don't (yet) support BTI. Is there any point having two separate flags for BTI and PAC-RET? If someone wants one, they'll very likely want the other, won't they? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From kvn at openjdk.java.net Sun Dec 12 16:09:24 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 12 Dec 2021 16:09:24 GMT Subject: [jdk18] RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. All testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/4 From duke at openjdk.java.net Sun Dec 12 16:12:21 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Sun, 12 Dec 2021 16:12:21 GMT Subject: [jdk18] Integrated: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 00:17:36 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it will overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. This pull request has now been integrated. Changeset: 9a1bbaf8 Author: Scott Gibbons Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk18/commit/9a1bbaf8db0e869ab76be8ab1bd0ddeb23693e7e Stats: 12 lines in 2 files changed: 7 ins; 0 del; 5 mod 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 8272809: JFR thread sampler SI_KERNEL SEGV in metaspace::VirtualSpaceList::contains Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.java.net/jdk18/pull/4 From dholmes at openjdk.java.net Sun Dec 12 23:20:10 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 12 Dec 2021 23:20:10 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags [v2] In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 19:49:48 GMT, Harold Seigel wrote: >> Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. >> >> The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. >> >> Thanks! Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > fix print_debug() message LGTM! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6800 From stuefe at openjdk.java.net Mon Dec 13 06:14:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 13 Dec 2021 06:14:29 GMT Subject: RFR: JDK-8278585: Drop unused code from OSThread Message-ID: Gentle cleanup of OSThread, removes some unused functionality. No functional changes. - both start proc and start param parameters are unused (when we create threads we always start with `thread_native_entry` as thread procedure). Removed members and constructor arguments (had always been called with NULL) - `valid_reposition_failure()` is unused, returns always false on all platforms. Used to return true on Solaris, but I could not find a caller even going back to jdk-8. - removed `thread_id_offset`, `thread_id_size`, both had been used at one time by C1 but not anymore. - Finally, removed the platform-independent stub for the windows-only `set_interrupted()`; replaced it with WINDOWS_ONLY at the only two places where it is invoked. Matter of taste, but I find this actually clearer than having a single-platform function looking like a generic one. Thanks, Thomas ------------- Commit messages: - remove unused members from OSThread Changes: https://git.openjdk.java.net/jdk/pull/6809/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6809&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278585 Stats: 57 lines in 12 files changed: 0 ins; 44 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/6809.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6809/head:pull/6809 PR: https://git.openjdk.java.net/jdk/pull/6809 From dholmes at openjdk.java.net Mon Dec 13 06:47:14 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 13 Dec 2021 06:47:14 GMT Subject: RFR: JDK-8278585: Drop unused code from OSThread In-Reply-To: References: Message-ID: On Sun, 12 Dec 2021 07:12:19 GMT, Thomas Stuefe wrote: > Gentle cleanup of OSThread, removes some unused functionality. No functional changes. > > - both start proc and start param parameters are unused (when we create threads we always start with `thread_native_entry` as thread procedure). Removed members and constructor arguments (had always been called with NULL) > - `valid_reposition_failure()` is unused, returns always false on all platforms. Used to return true on Solaris, but I could not find a caller even going back to jdk-8. > - removed `thread_id_offset`, `thread_id_size`, both had been used at one time by C1 but not anymore. > - Finally, removed the platform-independent stub for the windows-only `set_interrupted()`; replaced it with WINDOWS_ONLY at the only two places where it is invoked. Matter of taste, but I find this actually clearer than having a single-platform function looking like a generic one. > > Thanks, Thomas Hi Thomas, This cleanup looks good! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6809 From tschatzl at openjdk.java.net Mon Dec 13 09:47:14 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 13 Dec 2021 09:47:14 GMT Subject: RFR: JDK-8278585: Drop unused code from OSThread In-Reply-To: References: Message-ID: On Sun, 12 Dec 2021 07:12:19 GMT, Thomas Stuefe wrote: > Gentle cleanup of OSThread, removes some unused functionality. No functional changes. > > - both start proc and start param parameters are unused (when we create threads we always start with `thread_native_entry` as thread procedure). Removed members and constructor arguments (had always been called with NULL) > - `valid_reposition_failure()` is unused, returns always false on all platforms. Used to return true on Solaris, but I could not find a caller even going back to jdk-8. > - removed `thread_id_offset`, `thread_id_size`, both had been used at one time by C1 but not anymore. > - Finally, removed the platform-independent stub for the windows-only `set_interrupted()`; replaced it with WINDOWS_ONLY at the only two places where it is invoked. Matter of taste, but I find this actually clearer than having a single-platform function looking like a generic one. > > Thanks, Thomas Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6809 From duke at openjdk.java.net Mon Dec 13 09:53:17 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 13 Dec 2021 09:53:17 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: On Sun, 12 Dec 2021 10:19:30 GMT, Andrew Haley wrote: >> `-mbranch-protection` switches on both PAC-RET and BTI. This PR only covers a use of PAC that looks very ROP-focused to me. > > True, because we don't (yet) support BTI. Is there any point having two separate flags for BTI and PAC-RET? If someone wants one, they'll very likely want the other, won't they? You can support one without the other. The architecture allows you to have one without the other. The GCC flag is an enum of "none|standard|pac-ret[+leaf]|bti", with some of them changing depending on which cpu you specify to -mcpu (8.0,8.3,8.5 etc). Clang has the same flags. Interestingly, on MacOS Clang, -mbranch-protection is available but it'll give incorrect code. Instead you build with -arch arm64e. If your system had both, the only scenario I could see for only wanting just one would be for test/dev purposes. In a real production scenario you would want everything the system supports or nothing. An earlier version of my code had a UseBranchProtection="pac|bti|pac+bti|all|none" style option ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Dec 13 10:00:18 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 13 Dec 2021 10:00:18 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 09:50:26 GMT, Alan Hayward wrote: > You can support one without the other. The architecture allows you to have one without the other. The GCC flag is an enum of "none|standard|pac-ret[+leaf]|bti", with some of them changing depending on which cpu you specify to -mcpu (8.0,8.3,8.5 etc). Clang has the same flags. OK, so we have a precedent. > If your system had both, the only scenario I could see for only wanting just one would be for test/dev purposes. In a real production scenario you would want everything the system supports or nothing. Yes. > An earlier version of my code had a UseBranchProtection="pac|bti|pac+bti|all|none" style option That sounds great. It seems to me that following the GCC/Clang precedent is the wisest thing we could do. I can see no possible advantage in diverging: it only confuses programmers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From thartmann at openjdk.java.net Mon Dec 13 10:20:15 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 13 Dec 2021 10:20:15 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. As Vladimir mentioned, the fix will be forward ported to JDK 19 automatically. This PR should be closed without integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Mon Dec 13 11:52:20 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 13 Dec 2021 11:52:20 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: References: Message-ID: <9LgothtcvZnEscN5EaEo_reN-J77pTeSzTwKrR8zgQk=.723b7b37-cf5f-4746-a910-7d99df07749e@github.com> On Mon, 13 Dec 2021 09:56:41 GMT, Andrew Haley wrote: >> You can support one without the other. >> The architecture allows you to have one without the other. >> The GCC flag is an enum of "none|standard|pac-ret[+leaf]|bti", with some of them changing depending on which cpu you specify to -mcpu (8.0,8.3,8.5 etc). >> Clang has the same flags. Interestingly, on MacOS Clang, -mbranch-protection is available but it'll give incorrect code. Instead you build with -arch arm64e. >> >> If your system had both, the only scenario I could see for only wanting just one would be for test/dev purposes. In a real production scenario you would want everything the system supports or nothing. >> >> An earlier version of my code had a UseBranchProtection="pac|bti|pac+bti|all|none" style option > >> You can support one without the other. The architecture allows you to have one without the other. The GCC flag is an enum of "none|standard|pac-ret[+leaf]|bti", with some of them changing depending on which cpu you specify to -mcpu (8.0,8.3,8.5 etc). Clang has the same flags. > > OK, so we have a precedent. > >> If your system had both, the only scenario I could see for only wanting just one would be for test/dev purposes. In a real production scenario you would want everything the system supports or nothing. > > Yes. > >> An earlier version of my code had a UseBranchProtection="pac|bti|pac+bti|all|none" style option > > That sounds great. > > It seems to me that following the GCC/Clang precedent is the wisest thing we could do. I can see no possible advantage in diverging: it only confuses programmers. That gives us: A new flag -XX:UseBranchProtection With the options: none - no PAC support. (Default) standard - PAC support if the system supports it and the java binary was compiled with PAC. Otherwise off. pac-ret - PAC support, regardless if the system supports it or the java binary was compiled with PAC. A later BTI patch would add: standard - also adds BTI if the system supports it and the java binary was compiled with BTI. bti - BTI support, regardless if the system supports it or the java binary was compiled with BTI. Also, concat the flags with "+". Eg: standard+bti. No need to do this until BTI is added. For MacOS, you can only use PAC functionality when compiled for arm64e. Therefore arm64e would be supported by compiling the java binary for the arm64e and would always be enabled in that scenario. UseBranchProtection on MacoOS will only support the none option. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From smonteith at openjdk.java.net Mon Dec 13 12:36:09 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Mon, 13 Dec 2021 12:36:09 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 21:02:53 GMT, Evgeny Astigeevich wrote: > This JVM SpinPause can use different implementations of spin pauses. It uses `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides an instruction runner together with the description of the instruction and the instruction count. It can be used at places where generation of spin pauses is not possible, like in runtime SpinPause function. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. However JVM SpinPause might need less instructions than the intrinsic. To support such cases the instruction runner interface supports the `count` parameter. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ src/hotspot/cpu/aarch64/spin_wait_aarch64.hpp line 71: > 69: > 70: SpinWait(Inst inst = NONE, int count = 0, InstRunner inst_runner = run_none) : > 71: _inst(inst), _count(count), _inst_runner(inst_runner) {} Wouldn't it make more sense to have _inst_runner initialized in the constructor based on the value of Inst inst? You aren't differentiating between the two in get_spin_wait_desc anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From hseigel at openjdk.java.net Mon Dec 13 13:38:17 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 13 Dec 2021 13:38:17 GMT Subject: RFR: 8277481: Obsolete seldom used CDS flags [v2] In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 19:49:48 GMT, Harold Seigel wrote: >> Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. >> >> The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. >> >> Thanks! Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > fix print_debug() message Thanks Ioi, Calvin, and David for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6800 From hseigel at openjdk.java.net Mon Dec 13 13:38:17 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 13 Dec 2021 13:38:17 GMT Subject: Integrated: 8277481: Obsolete seldom used CDS flags In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 15:01:29 GMT, Harold Seigel wrote: > Please review this change to obsolete deprecated CDS options UseSharedSpaces, RequireSharedSpaces, DynamicDumpSharedSpaces, and DumpSharedSpaces. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows and Mach5 tiers 3-5 on Linux x64 and Windows x64. > > The use of UseSharedSpaces in ps_core_common.c was tested on Mac OS x64 by temporarily removing serviceability/sa/ClhsdbPmap.java#core from the problem list. > > Thanks! Harold This pull request has now been integrated. Changeset: 14f7385a Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/14f7385a72972e1f15b3103cc75a60c5733f6d98 Stats: 152 lines in 13 files changed: 22 ins; 94 del; 36 mod 8277481: Obsolete seldom used CDS flags Reviewed-by: iklam, ccheung, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6800 From duke at openjdk.java.net Mon Dec 13 15:56:02 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 13 Dec 2021 15:56:02 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v2] In-Reply-To: References: Message-ID: > This JVM SpinPause can use different implementations of spin pauses. It uses `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides an instruction runner together with the description of the instruction and the instruction count. It can be used at places where generation of spin pauses is not possible, like in runtime SpinPause function. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. However JVM SpinPause might need less instructions than the intrinsic. To support such cases the instruction runner interface supports the `count` parameter. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Reimplement JVM SpinPause with using stub code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6803/files - new: https://git.openjdk.java.net/jdk/pull/6803/files/901b5908..7c9877e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=00-01 Stats: 76 lines in 8 files changed: 27 ins; 40 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/6803.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6803/head:pull/6803 PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Mon Dec 13 15:57:22 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 13 Dec 2021 15:57:22 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: On Sat, 11 Dec 2021 09:25:51 GMT, Andrew Haley wrote: > > @nick-arm @theRealAph @stooart-mon Hi, could you have a look please? > > This is way too complicated. I'd use `MacroAssembler::spin_wait()` to generate a stub and call it from `SpinPause`. Hi @theRealAph, Thank you for advice. It was a good exercise to learn how to write a stub generator. I reimplemented SpinPause to use a generated stub. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From smonteith at openjdk.java.net Mon Dec 13 17:28:15 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Mon, 13 Dec 2021 17:28:15 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 15:56:02 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Reimplement JVM SpinPause with using stub code src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 387: > 385: if (VM_Version::spin_wait_desc().inst() == SpinWait::NONE) { > 386: return 0; > 387: } Would it be safe and more efficient to test func for NULL? ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Mon Dec 13 17:39:14 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Mon, 13 Dec 2021 17:39:14 GMT Subject: RFR: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: <3JLESi4g23s3QwQkcfPgPs7WPWcePKI1keGA4AMtqJA=.323f4848-f525-488d-b37d-52442385b6d2@github.com> On Mon, 13 Dec 2021 10:17:07 GMT, Tobias Hartmann wrote: >> The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. >> >> I also added a test case to detect this overrun. > > As Vladimir mentioned, the fix will be forward ported to JDK 19 automatically. This PR should be closed without integration. Thank you, @TobiHartmann. Closing this PR now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Mon Dec 13 17:39:14 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Mon, 13 Dec 2021 17:39:14 GMT Subject: Withdrawn: 8273108: RunThese24H crashes with SEGV in markWord::displaced_mark_helper() after JDK-8268276 In-Reply-To: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> References: <9vnRdXesAbnQtJ2n2zxs1o8lmhDNnGUC2FCziDqa_E0=.cacf7679-c8f5-4c7f-8a36-26600f76a219@github.com> Message-ID: On Thu, 9 Dec 2021 22:43:28 GMT, Scott Gibbons wrote: > The base64 decoder overwrites memory past the end of its output buffer in certain cases. It will not overwrite if the encoded string length is < 64 bytes. It also will not overwrite if the encoded string length mod 64 is >= 16. So the case where it *will* overwrite is when the input string length (the encoded byte length) mod 64 is less than 16. > > I also added a test case to detect this overrun. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6786 From duke at openjdk.java.net Mon Dec 13 17:44:09 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 13 Dec 2021 17:44:09 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 17:24:58 GMT, Stuart Monteith wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Reimplement JVM SpinPause with using stub code > > src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 387: > >> 385: if (VM_Version::spin_wait_desc().inst() == SpinWait::NONE) { >> 386: return 0; >> 387: } > > Would it be safe and more efficient to test func for NULL? Good idea. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Mon Dec 13 17:59:53 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 13 Dec 2021 17:59:53 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: > This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Check if StubRoutines::aarch64::spin_wait() is not null ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6803/files - new: https://git.openjdk.java.net/jdk/pull/6803/files/7c9877e5..419508b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=01-02 Stats: 7 lines in 1 file changed: 3 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6803.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6803/head:pull/6803 PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Mon Dec 13 17:59:54 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 13 Dec 2021 17:59:54 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 17:41:00 GMT, Evgeny Astigeevich wrote: >> src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 387: >> >>> 385: if (VM_Version::spin_wait_desc().inst() == SpinWait::NONE) { >>> 386: return 0; >>> 387: } >> >> Would it be safe and more efficient to test func for NULL? > > Good idea. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From coleenp at openjdk.java.net Mon Dec 13 23:22:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Dec 2021 23:22:42 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation Message-ID: This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug and linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). ------------- Commit messages: - Refactor cpu_name, etc into abstract_vm_version. - 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation Changes: https://git.openjdk.java.net/jdk/pull/6820/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6820&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8202579 Stats: 2972 lines in 33 files changed: 1105 ins; 1830 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/6820.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6820/head:pull/6820 PR: https://git.openjdk.java.net/jdk/pull/6820 From dholmes at openjdk.java.net Tue Dec 14 02:53:13 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 14 Dec 2021 02:53:13 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation In-Reply-To: References: Message-ID: <1XdDFHOWs-V8LCU8RhRL2ZH3E9l7WM_vI_QRZCoOcaw=.7e8cb129-046e-45d7-887e-ea2669c77d33@github.com> On Mon, 13 Dec 2021 23:14:43 GMT, Coleen Phillimore wrote: > This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. > > Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: > linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug > and > linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug > > Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). Hi Coleen, The code relocation seems okay. Good cleanup! One query below on `CPUInformationInterface::initialize()`. Good catch on the `Abstract_VM_Version` calls in jvm.cpp! I missed that in the review of JDK-8241071. Thanks, David src/hotspot/os/linux/os_perf_linux.cpp line 929: > 927: bool CPUInformationInterface::initialize() { > 928: _cpu_info = new CPUInformation(); > 929: VM_Version::initialize_cpu_information(); I can't figure out when this code will actually get executed in relation to the VM initialization process and VM_Version's initialization. Can this actually execute before that happens? Or could we assert that it has happened? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6820 From stuefe at openjdk.java.net Tue Dec 14 05:49:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 14 Dec 2021 05:49:14 GMT Subject: RFR: JDK-8278585: Drop unused code from OSThread In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 06:43:56 GMT, David Holmes wrote: >> Gentle cleanup of OSThread, removes some unused functionality. No functional changes. >> >> - both start proc and start param parameters are unused (when we create threads we always start with `thread_native_entry` as thread procedure). Removed members and constructor arguments (had always been called with NULL) >> - `valid_reposition_failure()` is unused, returns always false on all platforms. Used to return true on Solaris, but I could not find a caller even going back to jdk-8. >> - removed `thread_id_offset`, `thread_id_size`, both had been used at one time by C1 but not anymore. >> - Finally, removed the platform-independent stub for the windows-only `set_interrupted()`; replaced it with WINDOWS_ONLY at the only two places where it is invoked. Matter of taste, but I find this actually clearer than having a single-platform function looking like a generic one. >> >> Thanks, Thomas > > Hi Thomas, > > This cleanup looks good! > > Thanks, > David Thanks @dholmes-ora and @tschatzl . ------------- PR: https://git.openjdk.java.net/jdk/pull/6809 From stuefe at openjdk.java.net Tue Dec 14 05:49:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 14 Dec 2021 05:49:14 GMT Subject: Integrated: JDK-8278585: Drop unused code from OSThread In-Reply-To: References: Message-ID: On Sun, 12 Dec 2021 07:12:19 GMT, Thomas Stuefe wrote: > Gentle cleanup of OSThread, removes some unused functionality. No functional changes. > > - both start proc and start param parameters are unused (when we create threads we always start with `thread_native_entry` as thread procedure). Removed members and constructor arguments (had always been called with NULL) > - `valid_reposition_failure()` is unused, returns always false on all platforms. Used to return true on Solaris, but I could not find a caller even going back to jdk-8. > - removed `thread_id_offset`, `thread_id_size`, both had been used at one time by C1 but not anymore. > - Finally, removed the platform-independent stub for the windows-only `set_interrupted()`; replaced it with WINDOWS_ONLY at the only two places where it is invoked. Matter of taste, but I find this actually clearer than having a single-platform function looking like a generic one. > > Thanks, Thomas This pull request has now been integrated. Changeset: 3f9638d1 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/3f9638d124076019f49eb77bc3ff8b466e4beb53 Stats: 57 lines in 12 files changed: 0 ins; 44 del; 13 mod 8278585: Drop unused code from OSThread Reviewed-by: dholmes, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/6809 From svkamath at openjdk.java.net Tue Dec 14 06:23:48 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 14 Dec 2021 06:23:48 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 Message-ID: The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. ------------- Commit messages: - Fix for JDK:8274323 TestAESMain fails with invalid offset Changes: https://git.openjdk.java.net/jdk18/pull/19/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274323 Stats: 106 lines in 2 files changed: 31 ins; 26 del; 49 mod Patch: https://git.openjdk.java.net/jdk18/pull/19.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/19/head:pull/19 PR: https://git.openjdk.java.net/jdk18/pull/19 From pli at openjdk.java.net Tue Dec 14 08:56:43 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Tue, 14 Dec 2021 08:56:43 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization Message-ID: ### Background Post loop vectorization is a C2 compiler optimization in an experimental VM feature called PostLoopMultiversioning. It transforms the range-check eliminated post loop to a 1-iteration vectorized loop with vector mask. This optimization was contributed by Intel in 2016 to support x86 AVX512 masked vector instructions. However, it was disabled soon after an issue was found. Due to insufficient maintenance in these years, multiple bugs have been accumulated inside. But we (Arm) still think this is a useful framework for vector mask support in C2 auto-vectorized loops, for both x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable post loop vectorization. ### Changes in this patch This patch reworks post loop vectorization. The most significant change is removing vector mask support in C2 x86 backend and re-implementing it in the mid-end. With this, we can re-enable post loop vectorization for platforms other than x86. Previous implementation hard-codes x86 k1 register as a reserved AVX512 opmask register and defines two routines (setvectmask/restorevectmask) to set and restore the value of k1. But after [JDK-8211251](https://bugs.openjdk.java.net/browse/JDK-8211251) which encodes AVX512 instructions as unmasked by default, generated vector masks are no longer used in AVX512 vector instructions. To fix incorrect codegen and add vector mask support for more platforms, we turn to add a vector mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode to generate a mask and replace all Load/Store nodes in the post loop into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This IR form is exactly the same to those which are used in VectorAPI mask support. For now, we only add mask inputs for Load/Store nodes because we don't have reduction operations supported in post loop vectorization. After this change, the x86 k1 register is no longer reserved and can be allocated when PostLoopMultiversioning is enabled. Besides this change, we have fixed a compiler crash and five incorrect result issues with post loop vectorization. **I) C2 crashes with segmentation fault in strip-mined loops** Previous implementation was done before C2 loop strip-mining was merged into JDK master so it didn't take strip-mined loops into consideration. In C2's strip mined loops, post loop is not the sibling of the main loop in ideal loop tree. Instead, it's the sibling of the main loop's parent. This patch fixed a SIGSEGV issue caused by NULL pointer when locating post loop from strip-mined main loop. **II) Incorrect result issues with post loop vectorization** We have also fixed five incorrect vectorization issues. Some of them are hidden deep and can only be reproduced with corner cases. These issues have a common cause that it assumes the post loop can be vectorized if the vectorization in corresponding main loop is successful. But in many cases this assumption is wrong. Below are details. - **[Issue-1] Incorrect vectorization for partial vectorizable loops** This issue can be reproduced by below loop where only some operations in the loop body are vectorizable. for (int i = 0; i < 10000; i++) { res[i] = a[i] * b[i]; k = 3 * k + 1; } In the main loop, superword can work well if parts of the operations in loop body are not vectorizable since those parts can be unrolled only. But for post loops, we don't create vectors through combining scalar IRs generated from loop unrolling. Instead, we are doing scalars to vectors replacement for all operations in the loop body. Hence, all operations should be either vectorized together or not vectorized at all. To fix this kind of cases, we add an extra field "_slp_vector_pack_count" in CountedLoopNode to record the eventual count of vector packs in the main loop. This value is then passed to post loop and compared with post loop pack count. Vectorization will be bailed out in post loop if it creates more vector packs than in the main loop. - **[Issue-2] Incorrect result in loops with growing-down vectors** This issue appears with growing-down vectors, that is, vectors that grow to smaller memory address as the loop iterates. It can be reproduced by below counting-up loop with negative scale value in array index. for (int i = 0; i < 10000; i++) { a[MAX - i] = b[MAX - i]; } Cause of this issue is that for a growing-down vector, generated vector mask value has reversed vector-lane order so it masks incorrect vector lanes. Note that if negative scale value appears in counting-down loops, the vector will be growing up. With this rule, we fix the issue by only allowing positive array index scales in counting-up loops and negative array index scales in counting-down loops. This check is done with the help of SWPointer by comparing scale values in each memory access in the loop with loop stride value. - **[Issue-3] Incorrect result in manually unrolled loops** This issue can be reproduced by below manually unrolled loop. for (int i = 0; i < 10000; i += 2) { c[i] = a[i] + b[i]; c[i + 1] = a[i + 1] * b[i + 1]; } In this loop, operations in the 2nd statement duplicate those in the 1st statement with a small memory address offset. Vectorization in the main loop works well in this case because C2 does further unrolling and pack combination. But we cannot vectorize the post loop through replacement from scalars to vectors because it creates duplicated vector operations. To fix this, we restrict post loop vectorization to loops with stride values of 1 or -1. - **[Issue-4] Incorrect result in loops with mixed vector element sizes** This issue is found after we enable post loop vectorization for AArch64. It's reproducible by multiple array operations with different element sizes inside a loop. On x86, there is no issue because the values of x86 AVX512 opmasks only depend on which vector lanes are active. But AArch64 is different - the values of SVE predicates also depend on lane size of the vector. Hence, on AArch64 SVE, if a loop has mixed vector element sizes, we should use different vector masks. For now, we just support loops with only one vector element size, i.e., "int + float" vectors in a single loop is ok but "int + double" vectors in a single loop is not vectorizable. This fix also enables subword vectors support to make all primitive type array operations vectorizable. - **[Issue-5] Incorrect result in loops with potential data dependence** This issue can be reproduced by below corner case on AArch64 only. for (int i = 0; i < 10000; i++) { a[i] = x; a[i + OFFSET] = y; } In this case, two stores in the loop have data dependence if the OFFSET value is smaller than the vector length. So we cannot do vectorization through replacing scalars to vectors. But the main loop vectorization in this case is successful on AArch64 because AArch64 has partial vector load/store support. It splits vector fill with different values in lanes to several smaller-sized fills. In this patch, we add additional data dependence check for this kind of cases. The check is also done with the help of SWPointer class. In this check, we require that every two memory accesses (with at least one store) of the same element type (or subword size) in the loop has the same array index expression. ### Tests So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with experimental VM option "PostLoopMultiversioning" turned on. We found no issue in all tests. We notice that those existing cases are not enough because some of above issues are not spotted by them. We would like to add some new cases but we found existing vectorization tests are a bit cumbersome - golden results must be pre-calculated and hard-coded in the test code for correctness verification. Thus, in this patch, we propose a new vectorization testing framework. Our new framework brings a simpler way to add new cases. For a new test case, we only need to create a new method annotated with "@Test". The test runner will invoke each annotated method twice automatically. First time it runs in the interpreter and second time it's forced compiled by C2. Then the two return results are compared. So in this framework each test method should return a primitive value or an array of primitives. In this way, no extra verification code for vectorization correctness is required. This test runner is still jtreg-based and takes advantages of the jtreg WhiteBox API, which enables test methods running at specific compilation levels. Each test class inside is also jtreg-based. It just need to inherit from the test runner class and run with two additional options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". ### Summary & Future work In this patch, we reworked post loop vectorization. We made it platform independent and fixed several issues inside. We also implemented a new vectorization testing framework with many test cases inside. Meanwhile, we did some code cleanups. This patch only touches C2 code guarded with PostLoopMultiversioning, except a few data structure changes. So, there's no behavior change when experimental VM option PostLoopMultiversioning is off. Also, to reduce risks, we still propose to keep post loop vectorization experimental for now. But if it receives positive feedback, we would like to change it to non-experimental in the future. ------------- Commit messages: - 8183390: Fix and re-enable post loop vectorization Changes: https://git.openjdk.java.net/jdk/pull/6828/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6828&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8183390 Stats: 4793 lines in 39 files changed: 4482 ins; 284 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/6828.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6828/head:pull/6828 PR: https://git.openjdk.java.net/jdk/pull/6828 From fgao at openjdk.java.net Tue Dec 14 09:21:19 2021 From: fgao at openjdk.java.net (Fei Gao) Date: Tue, 14 Dec 2021 09:21:19 GMT Subject: RFR: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:49:23 GMT, Fei Gao wrote: > Mov (from general), incorrectly uses SIMD_Arrangement as the parameter > type of the assembler function. However, from Arm ARM [1], it's more > precise to use SIMD_RegVariant here. > > The situation is similar to Mov(to general) [2]. > > Note that as Mov(to general) is an alias of UMOV, we turn to re-use > UMOV encoding for Mov(to general) in this patch. > > [1] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--from-general---Move-general-purpose-register-to-a-vector-element--an-alias-of-INS--general-- > [2] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--to-general---Move-vector-element-to-general-purpose-register--an-alias-of-UMOV- The PR does some code cleaning in AArch64 assembler. Can I have your review please? ------------- PR: https://git.openjdk.java.net/jdk/pull/6629 From duke at openjdk.java.net Tue Dec 14 09:40:03 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 14 Dec 2021 09:40:03 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v11] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Change UseROPProtection to UseBranchProtection Change-Id: I31c5e1bb5c285262f262459c13057a46221682f1 CustomizedGitHooks: yes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/63f7515f..9c4f3498 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=09-10 Stats: 41 lines in 7 files changed: 15 ins; 7 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Tue Dec 14 09:40:04 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 14 Dec 2021 09:40:04 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v10] In-Reply-To: <9LgothtcvZnEscN5EaEo_reN-J77pTeSzTwKrR8zgQk=.723b7b37-cf5f-4746-a910-7d99df07749e@github.com> References: <9LgothtcvZnEscN5EaEo_reN-J77pTeSzTwKrR8zgQk=.723b7b37-cf5f-4746-a910-7d99df07749e@github.com> Message-ID: On Mon, 13 Dec 2021 11:48:52 GMT, Alan Hayward wrote: >>> You can support one without the other. The architecture allows you to have one without the other. The GCC flag is an enum of "none|standard|pac-ret[+leaf]|bti", with some of them changing depending on which cpu you specify to -mcpu (8.0,8.3,8.5 etc). Clang has the same flags. >> >> OK, so we have a precedent. >> >>> If your system had both, the only scenario I could see for only wanting just one would be for test/dev purposes. In a real production scenario you would want everything the system supports or nothing. >> >> Yes. >> >>> An earlier version of my code had a UseBranchProtection="pac|bti|pac+bti|all|none" style option >> >> That sounds great. >> >> It seems to me that following the GCC/Clang precedent is the wisest thing we could do. I can see no possible advantage in diverging: it only confuses programmers. > > That gives us: > > A new flag -XX:UseBranchProtection > > With the options: > none - no PAC support. (Default) > standard - PAC support if the system supports it and the java binary was compiled with PAC. Otherwise off. > pac-ret - PAC support, regardless if the system supports it or the java binary was compiled with PAC. > > A later BTI patch would add: > standard - also adds BTI if the system supports it and the java binary was compiled with BTI. > bti - BTI support, regardless if the system supports it or the java binary was compiled with BTI. > Also, concat the flags with "+". Eg: standard+bti. No need to do this until BTI is added. > > > For MacOS, you can only use PAC functionality when compiled for arm64e. Therefore arm64e would be supported by compiling the java binary for the arm64e and would always be enabled in that scenario. UseBranchProtection on MacoOS will only support the none option. Updated to the above. The CSR will need an update too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Tue Dec 14 10:16:24 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 10:16:24 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v11] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 09:40:03 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Change UseROPProtection to UseBranchProtection > > Change-Id: I31c5e1bb5c285262f262459c13057a46221682f1 > CustomizedGitHooks: yes Looks fine. I've done some basic testing on Graviton 3, which all seems to work. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Tue Dec 14 10:24:15 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 10:24:15 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 17:59:53 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Check if StubRoutines::aarch64::spin_wait() is not null src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 389: > 387: if (func == nullptr) { > 388: return 0; > 389: } It's better simply to give `_spin_wait` a default value that points to a `ret` instruction. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From aph at openjdk.java.net Tue Dec 14 10:27:13 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 10:27:13 GMT Subject: RFR: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions In-Reply-To: References: Message-ID: <76e387uxQnax3plvykqrewS_MIhvJEK7BPcONrrtvII=.32f79677-e7d3-4938-8fff-3269b0792c3f@github.com> On Wed, 1 Dec 2021 09:49:23 GMT, Fei Gao wrote: > Mov (from general), incorrectly uses SIMD_Arrangement as the parameter > type of the assembler function. However, from Arm ARM [1], it's more > precise to use SIMD_RegVariant here. > > The situation is similar to Mov(to general) [2]. > > Note that as Mov(to general) is an alias of UMOV, we turn to re-use > UMOV encoding for Mov(to general) in this patch. > > [1] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--from-general---Move-general-purpose-register-to-a-vector-element--an-alias-of-INS--general-- > [2] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--to-general---Move-vector-element-to-general-purpose-register--an-alias-of-UMOV- Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6629 From aph at openjdk.java.net Tue Dec 14 11:00:15 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 11:00:15 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 17:59:53 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Check if StubRoutines::aarch64::spin_wait() is not null src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 385: > 383: extern "C" { > 384: int SpinPause() { > 385: using spin_wait_func_ptr_t = void (*)(); I think you'll want `ThreadWXEnable wx(WXExec, thread);` for Apple here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Tue Dec 14 12:27:10 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 14 Dec 2021 12:27:10 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:56:41 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Check if StubRoutines::aarch64::spin_wait() is not null > > src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 385: > >> 383: extern "C" { >> 384: int SpinPause() { >> 385: using spin_wait_func_ptr_t = void (*)(); > > I think you'll want `ThreadWXEnable wx(WXExec, thread);` for Apple here. Sorry, why do we need it here? This is linux-aarch64 implementation of SpinPause. I think it should in SpinPause in `os_cpu/bsd_aarch64/os_bsd_aarch64.cpp`. Am I right? ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Tue Dec 14 13:20:47 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 14 Dec 2021 13:20:47 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v4] In-Reply-To: References: Message-ID: <_ta0Bs5id1MQ2cBe3CtDxSpTYmImDuyyNl_6fIWgmjU=.9f9dd6a5-a557-4678-9eb1-4a27620b3534@github.com> > This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: RET only StubRoutines::aarch64::spin_wait() for SpinWait::NONE ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6803/files - new: https://git.openjdk.java.net/jdk/pull/6803/files/419508b9..53d16580 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=02-03 Stats: 18 lines in 2 files changed: 11 ins; 4 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6803.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6803/head:pull/6803 PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Tue Dec 14 13:20:50 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 14 Dec 2021 13:20:50 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:20:59 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Check if StubRoutines::aarch64::spin_wait() is not null > > src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 389: > >> 387: if (func == nullptr) { >> 388: return 0; >> 389: } > > It's better simply to give `_spin_wait` a default value that points to a `ret` instruction. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From aph at openjdk.java.net Tue Dec 14 14:32:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 14:32:07 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 12:23:40 GMT, Evgeny Astigeevich wrote: >> src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 385: >> >>> 383: extern "C" { >>> 384: int SpinPause() { >>> 385: using spin_wait_func_ptr_t = void (*)(); >> >> I think you'll want `ThreadWXEnable wx(WXExec, thread);` for Apple here. > > Sorry, why do we need it here? This is linux-aarch64 implementation of SpinPause. > I think it should be in SpinPause in `os_cpu/bsd_aarch64/os_bsd_aarch64.cpp`. Am I right? You are indeed right. We can worry about Apple another day. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From phedlin at openjdk.java.net Tue Dec 14 14:56:32 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 14 Dec 2021 14:56:32 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 Message-ID: Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. - Interleaved ISO and ASCII check code. - Avoid 'umaxv' in the ISO main flow. - Using post inc in main loop. - Retain 8-char loop. - Removing conditional prefetch (no upside). - Adding ISO-8859-1 to encode-decode benchmark. Testing: tier1-3 The revised version compares like this (master vs. update). Benchmark (size) (type) Mode Cnt Score Error Units CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op Benchmark (size) (type) Mode Cnt Score Error Units CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op ------------- Commit messages: - 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 Changes: https://git.openjdk.java.net/jdk18/pull/20/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=20&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274243 Stats: 256 lines in 6 files changed: 126 ins; 90 del; 40 mod Patch: https://git.openjdk.java.net/jdk18/pull/20.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/20/head:pull/20 PR: https://git.openjdk.java.net/jdk18/pull/20 From phedlin at openjdk.java.net Tue Dec 14 14:56:32 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 14 Dec 2021 14:56:32 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. > > - Interleaved ISO and ASCII check code. > - Avoid 'umaxv' in the ISO main flow. > - Using post inc in main loop. > - Retain 8-char loop. > - Removing conditional prefetch (no upside). > - Adding ISO-8859-1 to encode-decode benchmark. > > Testing: tier1-3 > > The revised version compares like this (master vs. update). > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op Benchmarks, master vs. update (ran on Aurora/Ampere Altra): openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII ........77.55% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5 .........76.71% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_1 ...-2.31% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15 ..75.58% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16 ....... 1.04% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8 ........76.90% Note that ISO-8859-1 compares with the old intrinsic implementation (essentially the same) and that UTF-16 does not utilise the intrinsic. Runs that show the more pessimistic speed-up, when processing 2^n - 1 chars. openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ASCII .........72.97% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:BIG5 ..........64.46% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_1 ....-1.67% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:ISO_8859_15 ...70.85% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_16 ........-4.60% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2047-type:UTF_8 .........70.44% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ASCII ..........60.35% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:BIG5 ...........52.61% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_1 ..... 1.75% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:ISO_8859_15 ....61.45% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_16 .........-1.01% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:511-type:UTF_8 ..........59.46% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII ..........54.26% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5 ...........42.82% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_1 .....-0.54% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15 ....64.86% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16 .........-0.09% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8 ..........60.44% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII ..........51.51% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5 ...........46.54% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_1 .....-0.32% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15 ....56.48% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16 ......... 0.44% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8 ..........54.84% Runs to illustrate the threshold effect between the loops in the implementation. openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ASCII ...........32.30% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:BIG5 ............31.93% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_1 ......-0.02% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:ISO_8859_15 .....37.92% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_16 .......... 4.45% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:32-type:UTF_8 ...........40.35% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII ...........20.06% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5 ............21.64% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_1 ......-1.13% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15 .....27.04% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16 .......... 1.20% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8 ...........24.72% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII ...........19.37% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5 ............20.20% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_1 ......-1.01% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15 .....29.16% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16 .......... 0.34% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8 ...........25.35% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII ...........13.03% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5 ............13.74% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_1 ......-0.13% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15 .....19.26% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16 .......... 0.78% openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8 ...........17.70% Using the microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance, comparing master vs. update as follows: Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 151.025 ? 28.111 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 323.254 ? 5.648 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 244.375 ? 98.844 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 405.415 ? 5.947 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 728.172 ? 22.419 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 859.015 ? 90.541 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 117.044 ? 11.484 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 483.399 ? 38.614 ns/op Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 113.954 ? 7.657 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 353.266 ? 10.124 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 196.643 ? 52.954 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 429.157 ? 11.506 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 728.138 ? 34.898 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 859.697 ? 61.397 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 117.269 ? 6.623 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 491.559 ? 68.169 ns/op Note: The above was ran on a local dev-machine typically producing less than _perfectly_ consistent results. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From phedlin at openjdk.java.net Tue Dec 14 14:57:12 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 14 Dec 2021 14:57:12 GMT Subject: RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Mon, 6 Dec 2021 14:09:07 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation focusing on balance between small footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check. > > Testing: tier1-6 > > Benchmarks, 18-b26 vs. update (ran on Aurora/Ampere Altra): > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ASCII..........72.23% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:BIG5...........70.38% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:ISO_8859_15....67.81% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_16......... 3.72% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16384-type:UTF_8..........68.50% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ASCII...........65.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:BIG5............60.59% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:ISO_8859_15.....63.79% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_16.......... 1.04% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:2048-type:UTF_8...........63.33% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ASCII............57.25% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:BIG5.............49.33% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:ISO_8859_15......61.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_16........... 0.02% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:512-type:UTF_8............54.75% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ASCII............54.52% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:BIG5.............40.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:ISO_8859_15......58.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_16...........-0.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:255-type:UTF_8............55.98% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ASCII............47.37% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:BIG5.............36.41% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:ISO_8859_15......50.83% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_16........... 8.63% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:127-type:UTF_8............48.95% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ASCII.............17.55% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:BIG5..............18.58% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:ISO_8859_15.......20.82% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_16............ 4.16% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:31-type:UTF_8.............18.44% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ASCII.............21.96% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:BIG5..............22.42% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:ISO_8859_15.......30.27% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_16............-1.17% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:16-type:UTF_8.............35.99% > > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ASCII............. 6.19% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:BIG5.............. 7.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:ISO_8859_15....... 8.34% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_16............-0.46% > openjdk.bench.java.nio.CharsetEncodeDecode.encode-size:15-type:UTF_8............. 6.80% New PR on JDK-18 repo. https://github.com/openjdk/jdk18/pull/20 ------------- PR: https://git.openjdk.java.net/jdk/pull/6723 From aph at openjdk.java.net Tue Dec 14 15:05:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 15:05:09 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v4] In-Reply-To: <_ta0Bs5id1MQ2cBe3CtDxSpTYmImDuyyNl_6fIWgmjU=.9f9dd6a5-a557-4678-9eb1-4a27620b3534@github.com> References: <_ta0Bs5id1MQ2cBe3CtDxSpTYmImDuyyNl_6fIWgmjU=.9f9dd6a5-a557-4678-9eb1-4a27620b3534@github.com> Message-ID: On Tue, 14 Dec 2021 13:20:47 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > RET only StubRoutines::aarch64::spin_wait() for SpinWait::NONE Two more changes. The first makes the code simpler, and the second makes it less fragile. diff --git a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp index a8b2820bb62..e946f3be970 100644 --- a/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp @@ -6403,9 +6403,7 @@ class StubGenerator: public StubCodeGenerator { StubCodeMark mark(this, "StubRoutines", "spin_wait"); address start = __ pc(); - if (VM_Version::spin_wait_desc().inst() != SpinWait::NONE) { - __ spin_wait(); - } + __ spin_wait(); __ ret(lr); return start; diff --git a/src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp b/src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp index bb1a3325cea..f7c27ea7380 100644 --- a/src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp @@ -57,7 +57,10 @@ address StubRoutines::aarch64::_string_indexof_linear_uu = NULL; address StubRoutines::aarch64::_string_indexof_linear_ul = NULL; address StubRoutines::aarch64::_large_byte_array_inflate = NULL; address StubRoutines::aarch64::_method_entry_barrier = NULL; -address StubRoutines::aarch64::_spin_wait = NULL; + +static void spin_wait_nop() { } +address StubRoutines::aarch64::_spin_wait = CAST_FROM_FN_PTR(address, spin_wait_nop); + bool StubRoutines::aarch64::_completed = false; /** ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From coleenp at openjdk.java.net Tue Dec 14 15:28:27 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 15:28:27 GMT Subject: RFR: 8278791: Rename ClassLoaderData::holder_phantom Message-ID: Trivial change to change the name of holder_phantom. Tested with cds/appcds tests locally. ------------- Commit messages: - Fix comment above CLD::holder(). - 8278791: Rename ClassLoaderData::holder_phantom Changes: https://git.openjdk.java.net/jdk/pull/6834/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6834&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278791 Stats: 11 lines in 5 files changed: 1 ins; 3 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6834.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6834/head:pull/6834 PR: https://git.openjdk.java.net/jdk/pull/6834 From stefank at openjdk.java.net Tue Dec 14 15:28:28 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 14 Dec 2021 15:28:28 GMT Subject: RFR: 8278791: Rename ClassLoaderData::holder_phantom In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:15:44 GMT, Coleen Phillimore wrote: > Trivial change to change the name of holder_phantom. > Tested with cds/appcds tests locally. Thank you! ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6834 From duke at openjdk.java.net Tue Dec 14 16:01:38 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 14 Dec 2021 16:01:38 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v5] In-Reply-To: References: Message-ID: > This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Remove redundant check and guarantee non-null spin_wait ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6803/files - new: https://git.openjdk.java.net/jdk/pull/6803/files/53d16580..08e8ce4d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6803&range=03-04 Stats: 7 lines in 2 files changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6803.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6803/head:pull/6803 PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Tue Dec 14 16:01:39 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 14 Dec 2021 16:01:39 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v4] In-Reply-To: References: <_ta0Bs5id1MQ2cBe3CtDxSpTYmImDuyyNl_6fIWgmjU=.9f9dd6a5-a557-4678-9eb1-4a27620b3534@github.com> Message-ID: On Tue, 14 Dec 2021 15:01:42 GMT, Andrew Haley wrote: > Two more changes. The first makes the code simpler, and the second makes it less fragile. > Thank you! I forgot `__ spin_wait` generates nothing with `XX:OnSpinWaitInst=none`. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From aph at openjdk.java.net Tue Dec 14 16:05:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Dec 2021 16:05:08 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v5] In-Reply-To: References: Message-ID: <5daHIWXroddS2H3aUS1Fdz2GEsM7M7-W22JcI0Mb7xM=.70beee77-af30-4a5f-9d77-f65d10f04675@github.com> On Tue, 14 Dec 2021 16:01:38 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant check and guarantee non-null spin_wait Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From coleenp at openjdk.java.net Tue Dec 14 16:29:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 16:29:09 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 23:14:43 GMT, Coleen Phillimore wrote: > This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. > > Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: > linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug > and > linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug > > Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). Thanks for the review, David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From coleenp at openjdk.java.net Tue Dec 14 16:29:10 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 16:29:10 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation In-Reply-To: <1XdDFHOWs-V8LCU8RhRL2ZH3E9l7WM_vI_QRZCoOcaw=.7e8cb129-046e-45d7-887e-ea2669c77d33@github.com> References: <1XdDFHOWs-V8LCU8RhRL2ZH3E9l7WM_vI_QRZCoOcaw=.7e8cb129-046e-45d7-887e-ea2669c77d33@github.com> Message-ID: On Tue, 14 Dec 2021 02:32:54 GMT, David Holmes wrote: >> This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. >> >> Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: >> linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug >> and >> linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug >> >> Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). > > src/hotspot/os/linux/os_perf_linux.cpp line 929: > >> 927: bool CPUInformationInterface::initialize() { >> 928: _cpu_info = new CPUInformation(); >> 929: VM_Version::initialize_cpu_information(); > > I can't figure out when this code will actually get executed in relation to the VM initialization process and VM_Version's initialization. Can this actually execute before that happens? Or could we assert that it has happened? VM_Version::initialize() is called very early in Threads::create_vm. This latter VM_Version::initialize_cpu_information is called later when JFR event is emitted. The reason it was "_ext" was because it is part of JFR only. It seems that we might be able to consolidate this more now that it's moved later. I don't think adding an assert would be meaningful here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From coleenp at openjdk.java.net Tue Dec 14 17:42:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 17:42:00 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: Message-ID: > This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. > > Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: > linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug > and > linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug > > Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Added an initialization assert. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6820/files - new: https://git.openjdk.java.net/jdk/pull/6820/files/e611acb4..569896e9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6820&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6820&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6820.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6820/head:pull/6820 PR: https://git.openjdk.java.net/jdk/pull/6820 From coleenp at openjdk.java.net Tue Dec 14 17:42:05 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 17:42:05 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: <1XdDFHOWs-V8LCU8RhRL2ZH3E9l7WM_vI_QRZCoOcaw=.7e8cb129-046e-45d7-887e-ea2669c77d33@github.com> Message-ID: On Tue, 14 Dec 2021 16:24:10 GMT, Coleen Phillimore wrote: >> src/hotspot/os/linux/os_perf_linux.cpp line 929: >> >>> 927: bool CPUInformationInterface::initialize() { >>> 928: _cpu_info = new CPUInformation(); >>> 929: VM_Version::initialize_cpu_information(); >> >> I can't figure out when this code will actually get executed in relation to the VM initialization process and VM_Version's initialization. Can this actually execute before that happens? Or could we assert that it has happened? > > VM_Version::initialize() is called very early in Threads::create_vm. This latter VM_Version::initialize_cpu_information is called later when JFR event is emitted. The reason it was "_ext" was because it is part of JFR only. It seems that we might be able to consolidate this more later now that it's moved together. > I don't think adding an assert would be meaningful here. I did add a simple initialization assert in the x86 code where it might be interesting and reran the JFR tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From coleenp at openjdk.java.net Tue Dec 14 17:47:23 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 17:47:23 GMT Subject: RFR: 8278791: Rename ClassLoaderData::holder_phantom In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:15:44 GMT, Coleen Phillimore wrote: > Trivial change to change the name of holder_phantom. > Tested with cds/appcds tests locally. tier1 on Oracle platforms passed successfully. Thanks for the code review, Stefan. ------------- PR: https://git.openjdk.java.net/jdk/pull/6834 From coleenp at openjdk.java.net Tue Dec 14 17:48:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 17:48:19 GMT Subject: Integrated: 8278791: Rename ClassLoaderData::holder_phantom In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:15:44 GMT, Coleen Phillimore wrote: > Trivial change to change the name of holder_phantom. > Tested with cds/appcds tests locally. This pull request has now been integrated. Changeset: 3f91948c Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/3f91948c592d6968d2de6c59a5d93866f439c0e8 Stats: 11 lines in 5 files changed: 1 ins; 3 del; 7 mod 8278791: Rename ClassLoaderData::holder_phantom Reviewed-by: stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/6834 From sviswanathan at openjdk.java.net Tue Dec 14 18:23:31 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 14 Dec 2021 18:23:31 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/19 From sviswanathan at openjdk.java.net Tue Dec 14 18:27:34 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 14 Dec 2021 18:27:34 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. @vnkozlov Could you please also review this patch? ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From kvn at openjdk.java.net Tue Dec 14 20:08:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 14 Dec 2021 20:08:41 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Yes, we need to reexecute because code could be deoptimized during `new_array()` allocation. But why we allocate this temp array in Java heap? Why not on stack in stub code? Also I noticed next return from intrinsics code could be moved up before we generate new nodes in graph: `if (Matcher::htbl_entries == -1) return false;` ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/19 From hseigel at openjdk.java.net Tue Dec 14 20:22:00 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 14 Dec 2021 20:22:00 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 17:42:00 GMT, Coleen Phillimore wrote: >> This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. >> >> Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: >> linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug >> and >> linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug >> >> Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Added an initialization assert. Thanks for doing this! Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6820 From iklam at openjdk.java.net Tue Dec 14 20:50:23 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 14 Dec 2021 20:50:23 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble Message-ID: We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. When CDS is disabled, we do not see such variations. In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. ------------- Commit messages: - 8278020: ~13% variation in Renaissance-Scrabble Changes: https://git.openjdk.java.net/jdk/pull/6838/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6838&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278020 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6838.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6838/head:pull/6838 PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Tue Dec 14 21:33:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 14 Dec 2021 21:33:08 GMT Subject: RFR: 8277893: Arraycopy stress tests [v4] In-Reply-To: References: Message-ID: On Thu, 9 Dec 2021 07:12:47 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Bump timeout to 7200 > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Package declarations > - Add safety check for small systems > - Renames > - Single driver for all the tests > - Safer timeout settings > - Post-merge TEST.groups cleanup > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - ... and 3 more: https://git.openjdk.java.net/jdk/compare/382293c9...b749c367 Good. Something happened with notifications. I also did not get your response. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6594 From coleenp at openjdk.java.net Tue Dec 14 23:31:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Dec 2021 23:31:00 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: Message-ID: <1cNfNfX75OoH6vniMzrDpVFOuMwQ8y-gDbOyppYvKyw=.d23a5d9a-abaf-4f6c-b77d-67f71e67fd63@github.com> On Tue, 14 Dec 2021 17:42:00 GMT, Coleen Phillimore wrote: >> This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. >> >> Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: >> linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug >> and >> linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug >> >> Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Added an initialization assert. Thanks for reviewing, Harold! ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From dholmes at openjdk.java.net Wed Dec 15 01:19:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Dec 2021 01:19:00 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: Message-ID: <1b76ouVvE4CqegpZSEyBqgRctNPV2Xo04zRZkFyXrlA=.4590760e-56c7-4158-8861-f508e74e8ce8@github.com> On Tue, 14 Dec 2021 17:42:00 GMT, Coleen Phillimore wrote: >> This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. >> >> Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: >> linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug >> and >> linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug >> >> Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Added an initialization assert. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From dholmes at openjdk.java.net Wed Dec 15 01:38:58 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Dec 2021 01:38:58 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 18:45:55 GMT, Ioi Lam wrote: > We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. > > When CDS is disabled, we do not see such variations. > > In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). > > We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. > > My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. > > As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. > > I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. I would have kept the two fields together after the switch so that you can add a comment. Seems totally bizarre that two such separated fields would have such an affect. Isn't this needed in 18 though? ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Wed Dec 15 01:49:57 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 01:49:57 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 18:45:55 GMT, Ioi Lam wrote: > We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. > > When CDS is disabled, we do not see such variations. > > In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). > > We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. > > My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. > > As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. > > I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. Yes, this small change is good for JDK 18. Is it possible for JDK 19 (or later) to make CDS segmented to separate different types of data (Klass, Method, etc)? An other experiment could be done is to add padding (space) before Klass data to make sure it is in different cache line. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From pli at openjdk.java.net Wed Dec 15 02:26:00 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Wed, 15 Dec 2021 02:26:00 GMT Subject: RFR: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions In-Reply-To: References: Message-ID: On Wed, 1 Dec 2021 09:49:23 GMT, Fei Gao wrote: > Mov (from general), incorrectly uses SIMD_Arrangement as the parameter > type of the assembler function. However, from Arm ARM [1], it's more > precise to use SIMD_RegVariant here. > > The situation is similar to Mov(to general) [2]. > > Note that as Mov(to general) is an alias of UMOV, we turn to re-use > UMOV encoding for Mov(to general) in this patch. > > [1] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--from-general---Move-general-purpose-register-to-a-vector-element--an-alias-of-INS--general-- > [2] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--to-general---Move-vector-element-to-general-purpose-register--an-alias-of-UMOV- Marked as reviewed by pli (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6629 From fgao at openjdk.java.net Wed Dec 15 02:30:03 2021 From: fgao at openjdk.java.net (Fei Gao) Date: Wed, 15 Dec 2021 02:30:03 GMT Subject: Integrated: 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions In-Reply-To: References: Message-ID: <1pddjU3ikeReukL53dsA-WpA36scAt-kxVEk4vYLCzQ=.c9605839-9c1c-440a-9bd5-dabf8eb8a1ae@github.com> On Wed, 1 Dec 2021 09:49:23 GMT, Fei Gao wrote: > Mov (from general), incorrectly uses SIMD_Arrangement as the parameter > type of the assembler function. However, from Arm ARM [1], it's more > precise to use SIMD_RegVariant here. > > The situation is similar to Mov(to general) [2]. > > Note that as Mov(to general) is an alias of UMOV, we turn to re-use > UMOV encoding for Mov(to general) in this patch. > > [1] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--from-general---Move-general-purpose-register-to-a-vector-element--an-alias-of-INS--general-- > [2] https://developer.arm.com/documentation/ddi0602/2020-12/SIMD-FP-Instructions/MOV--to-general---Move-vector-element-to-general-purpose-register--an-alias-of-UMOV- This pull request has now been integrated. Changeset: c442587f Author: Fei Gao Committer: Pengfei Li URL: https://git.openjdk.java.net/jdk/commit/c442587f1e72a614302cd76c20e13f1cb1703641 Stats: 52 lines in 7 files changed: 1 ins; 3 del; 48 mod 8277619: AArch64: Incorrect parameter type in Advanced SIMD Copy assembler functions Reviewed-by: aph, pli ------------- PR: https://git.openjdk.java.net/jdk/pull/6629 From iklam at openjdk.java.net Wed Dec 15 04:19:35 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 04:19:35 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: References: Message-ID: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> > We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. > > When CDS is disabled, we do not see such variations. > > In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). > > We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. > > My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. > > As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. > > I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: added comments about the location of vtable_len ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6838/files - new: https://git.openjdk.java.net/jdk/pull/6838/files/fd1318a3..55e71805 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6838&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6838&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6838.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6838/head:pull/6838 PR: https://git.openjdk.java.net/jdk/pull/6838 From iklam at openjdk.java.net Wed Dec 15 04:27:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 04:27:04 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 01:35:42 GMT, David Holmes wrote: > I would have kept the two fields together after the switch so that you can add a comment. Seems totally bizarre that two such separated fields would have such an affect. > > Isn't this needed in 18 though? David, if my theory is correct, the contention does not happen between the two fields. It happens between the `_vtable_len` field, and the object that immediately precedes the Klass. I have not found out what that other object is. Eric is writing a simplified version of the benchmark and I hope to use that to narrow down the problem. I have added comments near _vtable_len to explain why it's placed there inside the Klass. I swapped it with `_modifier_flags` because they are the same size, and `_modifier_flags` doesn't seem to be accessed nearly as often as `_vtable_len`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From iklam at openjdk.java.net Wed Dec 15 04:27:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 04:27:04 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 01:45:35 GMT, Vladimir Kozlov wrote: > Yes, this small change is good for JDK 18. Is it possible for JDK 19 (or later) to make CDS segmented to separate different types of data (Klass, Method, etc)? It's possible to make CDS segmented so that the Klasses are allocated together. That can be done in 19. > An other experiment could be done is to add padding (space) before Klass data to make sure it is in different cache line. That's a good idea. I will try that to see if it has the same effect as the current patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From dholmes at openjdk.java.net Wed Dec 15 05:39:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Dec 2021 05:39:00 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam wrote: >> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. >> >> When CDS is disabled, we do not see such variations. >> >> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). >> >> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. >> >> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. >> >> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. >> >> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added comments about the location of vtable_len Marked as reviewed by dholmes (Reviewer). I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From stuefe at openjdk.java.net Wed Dec 15 05:49:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 15 Dec 2021 05:49:59 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam wrote: >> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. >> >> When CDS is disabled, we do not see such variations. >> >> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). >> >> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. >> >> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. >> >> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. >> >> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added comments about the location of vtable_len Hi Ioi, The fix looks fine. this is interesting to me, because in the context of Lilliput (https://github.com/openjdk/lilliput/pull/13) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions. What tool did you use to measure the dcache misses? Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Wed Dec 15 06:06:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 06:06:00 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam wrote: >> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. >> >> When CDS is disabled, we do not see such variations. >> >> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). >> >> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. >> >> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. >> >> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. >> >> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added comments about the location of vtable_len Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Wed Dec 15 06:06:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 06:06:00 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> Message-ID: On Wed, 15 Dec 2021 05:35:26 GMT, David Holmes wrote: > I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. > > Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. I suggested padding only as experiment to prove Ioi's theory. Current changes are good as the fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Wed Dec 15 06:08:28 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 06:08:28 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation Message-ID: A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. Added new regression test. Tested tier1-3. ------------- Commit messages: - 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation Changes: https://git.openjdk.java.net/jdk18/pull/27/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=27&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277964 Stats: 151 lines in 7 files changed: 149 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk18/pull/27.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/27/head:pull/27 PR: https://git.openjdk.java.net/jdk18/pull/27 From iklam at openjdk.java.net Wed Dec 15 06:15:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 06:15:58 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> Message-ID: <2EwHhws1jv5OwWRUCNFakaKO20khaA3sH3a3VuBYQRI=.d1532862-2c7e-4ed2-8c4b-a1ec681c007f@github.com> On Wed, 15 Dec 2021 06:02:35 GMT, Vladimir Kozlov wrote: > I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. > > Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. @ericcaspole checked our benchmark database and the regression seems to have started around JDK 15. So yes, I will backport the fix to 17 and 18. I want to integrate into the mainline first so it can be baked a little before the backport. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From stuefe at openjdk.java.net Wed Dec 15 06:15:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 15 Dec 2021 06:15:59 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam wrote: >> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. >> >> When CDS is disabled, we do not see such variations. >> >> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). >> >> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. >> >> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. >> >> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. >> >> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added comments about the location of vtable_len > My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. BTW you could test that theory, if you wanted, by repeating the test with CDS off and disabling compressed class pointers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From iklam at openjdk.java.net Wed Dec 15 06:26:06 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 06:26:06 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam wrote: >> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. >> >> When CDS is disabled, we do not see such variations. >> >> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). >> >> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. >> >> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. >> >> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. >> >> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added comments about the location of vtable_len > Hi Ioi, > > The fix looks fine. > > this is interesting to me, because in the context of Lilliput ([openjdk/lilliput#13](https://github.com/openjdk/lilliput/pull/13)) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions. > > What tool did you use to measure the dcache misses? > > Cheers, Thomas Hi Thomas, @ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report): Column 1: cycles (125424 events) Column 2: l1d_pend_miss.pending_cycles (56716 events) Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events) 0.08% 0.02% 0.03% 0x00007f488cda2dc8: mov 0x10(%r10),%r11d 12.26% 16.97% 16.23% 0x00007f488cda2dcc: lea 0x1b8(%r10,%r11,8),%r11 @vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often? ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From stuefe at openjdk.java.net Wed Dec 15 06:33:58 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 15 Dec 2021 06:33:58 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> Message-ID: On Wed, 15 Dec 2021 06:22:29 GMT, Ioi Lam wrote: > > Hi Ioi, > > The fix looks fine. > > this is interesting to me, because in the context of Lilliput ([openjdk/lilliput#13](https://github.com/openjdk/lilliput/pull/13)) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions. > > What tool did you use to measure the dcache misses? > > Cheers, Thomas > > Hi Thomas, > > @ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report): > > ``` > Column 1: cycles (125424 events) > Column 2: l1d_pend_miss.pending_cycles (56716 events) > Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events) > > 0.08% 0.02% 0.03% 0x00007f488cda2dc8: mov 0x10(%r10),%r11d > 12.26% 16.97% 16.23% 0x00007f488cda2dcc: lea 0x1b8(%r10,%r11,8),%r11 > ``` Okay, thanks. > > @vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often? Method counters? May be worth spreading them out better, or to pad them to prevent false sharing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From dholmes at openjdk.java.net Wed Dec 15 07:02:59 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Dec 2021 07:02:59 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: Message-ID: <2Kb5Q961w7Kw69AeNYu840nXSIoTi2lAlsZyKZk3mis=.9f063cd0-f593-48f6-bdb2-f99c52529cf0@github.com> On Wed, 15 Dec 2021 05:59:45 GMT, Vladimir Kozlov wrote: > A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. > > It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. > > Added new regression test. Tested tier1-3. src/hotspot/share/oops/method.cpp line 827: > 825: */ > 826: bool Method::can_omit_stack_trace() { > 827: if (method_holder()->class_loader_data()->is_boot_class_loader_data()) { Do you actually need to check this? ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From thartmann at openjdk.java.net Wed Dec 15 07:06:59 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 15 Dec 2021 07:06:59 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 08:48:25 GMT, Pengfei Li wrote: > ### Background > > Post loop vectorization is a C2 compiler optimization in an experimental > VM feature called PostLoopMultiversioning. It transforms the range-check > eliminated post loop to a 1-iteration vectorized loop with vector mask. > This optimization was contributed by Intel in 2016 to support x86 AVX512 > masked vector instructions. However, it was disabled soon after an issue > was found. Due to insufficient maintenance in these years, multiple bugs > have been accumulated inside. But we (Arm) still think this is a useful > framework for vector mask support in C2 auto-vectorized loops, for both > x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable > post loop vectorization. > > ### Changes in this patch > > This patch reworks post loop vectorization. The most significant change > is removing vector mask support in C2 x86 backend and re-implementing > it in the mid-end. With this, we can re-enable post loop vectorization > for platforms other than x86. > > Previous implementation hard-codes x86 k1 register as a reserved AVX512 > opmask register and defines two routines (setvectmask/restorevectmask) > to set and restore the value of k1. But after [JDK-8211251](https://bugs.openjdk.java.net/browse/JDK-8211251) which encodes > AVX512 instructions as unmasked by default, generated vector masks are > no longer used in AVX512 vector instructions. To fix incorrect codegen > and add vector mask support for more platforms, we turn to add a vector > mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode > to generate a mask and replace all Load/Store nodes in the post loop > into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This > IR form is exactly the same to those which are used in VectorAPI mask > support. For now, we only add mask inputs for Load/Store nodes because > we don't have reduction operations supported in post loop vectorization. > After this change, the x86 k1 register is no longer reserved and can be > allocated when PostLoopMultiversioning is enabled. > > Besides this change, we have fixed a compiler crash and five incorrect > result issues with post loop vectorization. > > **I) C2 crashes with segmentation fault in strip-mined loops** > > Previous implementation was done before C2 loop strip-mining was merged > into JDK master so it didn't take strip-mined loops into consideration. > In C2's strip mined loops, post loop is not the sibling of the main loop > in ideal loop tree. Instead, it's the sibling of the main loop's parent. > This patch fixed a SIGSEGV issue caused by NULL pointer when locating > post loop from strip-mined main loop. > > **II) Incorrect result issues with post loop vectorization** > > We have also fixed five incorrect vectorization issues. Some of them are > hidden deep and can only be reproduced with corner cases. These issues > have a common cause that it assumes the post loop can be vectorized if > the vectorization in corresponding main loop is successful. But in many > cases this assumption is wrong. Below are details. > > - **[Issue-1] Incorrect vectorization for partial vectorizable loops** > > This issue can be reproduced by below loop where only some operations in > the loop body are vectorizable. > > for (int i = 0; i < 10000; i++) { > res[i] = a[i] * b[i]; > k = 3 * k + 1; > } > > In the main loop, superword can work well if parts of the operations in > loop body are not vectorizable since those parts can be unrolled only. > But for post loops, we don't create vectors through combining scalar IRs > generated from loop unrolling. Instead, we are doing scalars to vectors > replacement for all operations in the loop body. Hence, all operations > should be either vectorized together or not vectorized at all. To fix > this kind of cases, we add an extra field "_slp_vector_pack_count" in > CountedLoopNode to record the eventual count of vector packs in the main > loop. This value is then passed to post loop and compared with post loop > pack count. Vectorization will be bailed out in post loop if it creates > more vector packs than in the main loop. > > - **[Issue-2] Incorrect result in loops with growing-down vectors** > > This issue appears with growing-down vectors, that is, vectors that grow > to smaller memory address as the loop iterates. It can be reproduced by > below counting-up loop with negative scale value in array index. > > for (int i = 0; i < 10000; i++) { > a[MAX - i] = b[MAX - i]; > } > > Cause of this issue is that for a growing-down vector, generated vector > mask value has reversed vector-lane order so it masks incorrect vector > lanes. Note that if negative scale value appears in counting-down loops, > the vector will be growing up. With this rule, we fix the issue by only > allowing positive array index scales in counting-up loops and negative > array index scales in counting-down loops. This check is done with the > help of SWPointer by comparing scale values in each memory access in the > loop with loop stride value. > > - **[Issue-3] Incorrect result in manually unrolled loops** > > This issue can be reproduced by below manually unrolled loop. > > for (int i = 0; i < 10000; i += 2) { > c[i] = a[i] + b[i]; > c[i + 1] = a[i + 1] * b[i + 1]; > } > > In this loop, operations in the 2nd statement duplicate those in the 1st > statement with a small memory address offset. Vectorization in the main > loop works well in this case because C2 does further unrolling and pack > combination. But we cannot vectorize the post loop through replacement > from scalars to vectors because it creates duplicated vector operations. > To fix this, we restrict post loop vectorization to loops with stride > values of 1 or -1. > > - **[Issue-4] Incorrect result in loops with mixed vector element sizes** > > This issue is found after we enable post loop vectorization for AArch64. > It's reproducible by multiple array operations with different element > sizes inside a loop. On x86, there is no issue because the values of x86 > AVX512 opmasks only depend on which vector lanes are active. But AArch64 > is different - the values of SVE predicates also depend on lane size of > the vector. Hence, on AArch64 SVE, if a loop has mixed vector element > sizes, we should use different vector masks. For now, we just support > loops with only one vector element size, i.e., "int + float" vectors in > a single loop is ok but "int + double" vectors in a single loop is not > vectorizable. This fix also enables subword vectors support to make all > primitive type array operations vectorizable. > > - **[Issue-5] Incorrect result in loops with potential data dependence** > > This issue can be reproduced by below corner case on AArch64 only. > > for (int i = 0; i < 10000; i++) { > a[i] = x; > a[i + OFFSET] = y; > } > > In this case, two stores in the loop have data dependence if the OFFSET > value is smaller than the vector length. So we cannot do vectorization > through replacing scalars to vectors. But the main loop vectorization > in this case is successful on AArch64 because AArch64 has partial vector > load/store support. It splits vector fill with different values in lanes > to several smaller-sized fills. In this patch, we add additional data > dependence check for this kind of cases. The check is also done with the > help of SWPointer class. In this check, we require that every two memory > accesses (with at least one store) of the same element type (or subword > size) in the loop has the same array index expression. > > ### Tests > > So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with > experimental VM option "PostLoopMultiversioning" turned on. We found no > issue in all tests. We notice that those existing cases are not enough > because some of above issues are not spotted by them. We would like to > add some new cases but we found existing vectorization tests are a bit > cumbersome - golden results must be pre-calculated and hard-coded in the > test code for correctness verification. Thus, in this patch, we propose > a new vectorization testing framework. > > Our new framework brings a simpler way to add new cases. For a new test > case, we only need to create a new method annotated with "@Test". The > test runner will invoke each annotated method twice automatically. First > time it runs in the interpreter and second time it's forced compiled by > C2. Then the two return results are compared. So in this framework each > test method should return a primitive value or an array of primitives. > In this way, no extra verification code for vectorization correctness is > required. This test runner is still jtreg-based and takes advantages of > the jtreg WhiteBox API, which enables test methods running at specific > compilation levels. Each test class inside is also jtreg-based. It just > need to inherit from the test runner class and run with two additional > options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". > > ### Summary & Future work > > In this patch, we reworked post loop vectorization. We made it platform > independent and fixed several issues inside. We also implemented a new > vectorization testing framework with many test cases inside. Meanwhile, > we did some code cleanups. > > This patch only touches C2 code guarded with PostLoopMultiversioning, > except a few data structure changes. So, there's no behavior change when > experimental VM option PostLoopMultiversioning is off. Also, to reduce > risks, we still propose to keep post loop vectorization experimental for > now. But if it receives positive feedback, we would like to change it to > non-experimental in the future. I haven't looked at the code yet but just gave this a quick run through our testing. I'm seeing several hundred failures: java.lang.ClassNotFoundException: compiler/vectorization/runner/ArrayCopyTest at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:383) at java.base/java.lang.Class.forName(Class.java:376) at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:183) at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:833) java.lang.RuntimeException: Cannot create test instance for class compiler/vectorization/runner/ArrayCopyTest at compiler.vectorization.runner.VectorizationTestRunner.fail(VectorizationTestRunner.java:195) at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:188) at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:833) Similar ClassNotFoundExceptions happen with the other tests. The new tests also intermittently time out with `-Xcomp`. java.lang.RuntimeException: Test failed in compiler.vectorization.runner.ArrayIndexFillTest.fillByteArray: Method is not compiled after 30s. at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:72) at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:200) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:833) Running with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`. Similar failures happen with the other tests. There are also failures in the pre-submit tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/6828 From pli at openjdk.java.net Wed Dec 15 07:27:59 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Wed, 15 Dec 2021 07:27:59 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization In-Reply-To: References: Message-ID: <8xK2PmI8ViNhtdSJhC2OEHjbLEc2rKCQRBAFGLw13G8=.ec7e1206-98b3-4777-a1aa-9247dfcb7bd6@github.com> On Wed, 15 Dec 2021 07:04:12 GMT, Tobias Hartmann wrote: >> ### Background >> >> Post loop vectorization is a C2 compiler optimization in an experimental >> VM feature called PostLoopMultiversioning. It transforms the range-check >> eliminated post loop to a 1-iteration vectorized loop with vector mask. >> This optimization was contributed by Intel in 2016 to support x86 AVX512 >> masked vector instructions. However, it was disabled soon after an issue >> was found. Due to insufficient maintenance in these years, multiple bugs >> have been accumulated inside. But we (Arm) still think this is a useful >> framework for vector mask support in C2 auto-vectorized loops, for both >> x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable >> post loop vectorization. >> >> ### Changes in this patch >> >> This patch reworks post loop vectorization. The most significant change >> is removing vector mask support in C2 x86 backend and re-implementing >> it in the mid-end. With this, we can re-enable post loop vectorization >> for platforms other than x86. >> >> Previous implementation hard-codes x86 k1 register as a reserved AVX512 >> opmask register and defines two routines (setvectmask/restorevectmask) >> to set and restore the value of k1. But after [JDK-8211251](https://bugs.openjdk.java.net/browse/JDK-8211251) which encodes >> AVX512 instructions as unmasked by default, generated vector masks are >> no longer used in AVX512 vector instructions. To fix incorrect codegen >> and add vector mask support for more platforms, we turn to add a vector >> mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode >> to generate a mask and replace all Load/Store nodes in the post loop >> into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This >> IR form is exactly the same to those which are used in VectorAPI mask >> support. For now, we only add mask inputs for Load/Store nodes because >> we don't have reduction operations supported in post loop vectorization. >> After this change, the x86 k1 register is no longer reserved and can be >> allocated when PostLoopMultiversioning is enabled. >> >> Besides this change, we have fixed a compiler crash and five incorrect >> result issues with post loop vectorization. >> >> **I) C2 crashes with segmentation fault in strip-mined loops** >> >> Previous implementation was done before C2 loop strip-mining was merged >> into JDK master so it didn't take strip-mined loops into consideration. >> In C2's strip mined loops, post loop is not the sibling of the main loop >> in ideal loop tree. Instead, it's the sibling of the main loop's parent. >> This patch fixed a SIGSEGV issue caused by NULL pointer when locating >> post loop from strip-mined main loop. >> >> **II) Incorrect result issues with post loop vectorization** >> >> We have also fixed five incorrect vectorization issues. Some of them are >> hidden deep and can only be reproduced with corner cases. These issues >> have a common cause that it assumes the post loop can be vectorized if >> the vectorization in corresponding main loop is successful. But in many >> cases this assumption is wrong. Below are details. >> >> - **[Issue-1] Incorrect vectorization for partial vectorizable loops** >> >> This issue can be reproduced by below loop where only some operations in >> the loop body are vectorizable. >> >> for (int i = 0; i < 10000; i++) { >> res[i] = a[i] * b[i]; >> k = 3 * k + 1; >> } >> >> In the main loop, superword can work well if parts of the operations in >> loop body are not vectorizable since those parts can be unrolled only. >> But for post loops, we don't create vectors through combining scalar IRs >> generated from loop unrolling. Instead, we are doing scalars to vectors >> replacement for all operations in the loop body. Hence, all operations >> should be either vectorized together or not vectorized at all. To fix >> this kind of cases, we add an extra field "_slp_vector_pack_count" in >> CountedLoopNode to record the eventual count of vector packs in the main >> loop. This value is then passed to post loop and compared with post loop >> pack count. Vectorization will be bailed out in post loop if it creates >> more vector packs than in the main loop. >> >> - **[Issue-2] Incorrect result in loops with growing-down vectors** >> >> This issue appears with growing-down vectors, that is, vectors that grow >> to smaller memory address as the loop iterates. It can be reproduced by >> below counting-up loop with negative scale value in array index. >> >> for (int i = 0; i < 10000; i++) { >> a[MAX - i] = b[MAX - i]; >> } >> >> Cause of this issue is that for a growing-down vector, generated vector >> mask value has reversed vector-lane order so it masks incorrect vector >> lanes. Note that if negative scale value appears in counting-down loops, >> the vector will be growing up. With this rule, we fix the issue by only >> allowing positive array index scales in counting-up loops and negative >> array index scales in counting-down loops. This check is done with the >> help of SWPointer by comparing scale values in each memory access in the >> loop with loop stride value. >> >> - **[Issue-3] Incorrect result in manually unrolled loops** >> >> This issue can be reproduced by below manually unrolled loop. >> >> for (int i = 0; i < 10000; i += 2) { >> c[i] = a[i] + b[i]; >> c[i + 1] = a[i + 1] * b[i + 1]; >> } >> >> In this loop, operations in the 2nd statement duplicate those in the 1st >> statement with a small memory address offset. Vectorization in the main >> loop works well in this case because C2 does further unrolling and pack >> combination. But we cannot vectorize the post loop through replacement >> from scalars to vectors because it creates duplicated vector operations. >> To fix this, we restrict post loop vectorization to loops with stride >> values of 1 or -1. >> >> - **[Issue-4] Incorrect result in loops with mixed vector element sizes** >> >> This issue is found after we enable post loop vectorization for AArch64. >> It's reproducible by multiple array operations with different element >> sizes inside a loop. On x86, there is no issue because the values of x86 >> AVX512 opmasks only depend on which vector lanes are active. But AArch64 >> is different - the values of SVE predicates also depend on lane size of >> the vector. Hence, on AArch64 SVE, if a loop has mixed vector element >> sizes, we should use different vector masks. For now, we just support >> loops with only one vector element size, i.e., "int + float" vectors in >> a single loop is ok but "int + double" vectors in a single loop is not >> vectorizable. This fix also enables subword vectors support to make all >> primitive type array operations vectorizable. >> >> - **[Issue-5] Incorrect result in loops with potential data dependence** >> >> This issue can be reproduced by below corner case on AArch64 only. >> >> for (int i = 0; i < 10000; i++) { >> a[i] = x; >> a[i + OFFSET] = y; >> } >> >> In this case, two stores in the loop have data dependence if the OFFSET >> value is smaller than the vector length. So we cannot do vectorization >> through replacing scalars to vectors. But the main loop vectorization >> in this case is successful on AArch64 because AArch64 has partial vector >> load/store support. It splits vector fill with different values in lanes >> to several smaller-sized fills. In this patch, we add additional data >> dependence check for this kind of cases. The check is also done with the >> help of SWPointer class. In this check, we require that every two memory >> accesses (with at least one store) of the same element type (or subword >> size) in the loop has the same array index expression. >> >> ### Tests >> >> So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with >> experimental VM option "PostLoopMultiversioning" turned on. We found no >> issue in all tests. We notice that those existing cases are not enough >> because some of above issues are not spotted by them. We would like to >> add some new cases but we found existing vectorization tests are a bit >> cumbersome - golden results must be pre-calculated and hard-coded in the >> test code for correctness verification. Thus, in this patch, we propose >> a new vectorization testing framework. >> >> Our new framework brings a simpler way to add new cases. For a new test >> case, we only need to create a new method annotated with "@Test". The >> test runner will invoke each annotated method twice automatically. First >> time it runs in the interpreter and second time it's forced compiled by >> C2. Then the two return results are compared. So in this framework each >> test method should return a primitive value or an array of primitives. >> In this way, no extra verification code for vectorization correctness is >> required. This test runner is still jtreg-based and takes advantages of >> the jtreg WhiteBox API, which enables test methods running at specific >> compilation levels. Each test class inside is also jtreg-based. It just >> need to inherit from the test runner class and run with two additional >> options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". >> >> ### Summary & Future work >> >> In this patch, we reworked post loop vectorization. We made it platform >> independent and fixed several issues inside. We also implemented a new >> vectorization testing framework with many test cases inside. Meanwhile, >> we did some code cleanups. >> >> This patch only touches C2 code guarded with PostLoopMultiversioning, >> except a few data structure changes. So, there's no behavior change when >> experimental VM option PostLoopMultiversioning is off. Also, to reduce >> risks, we still propose to keep post loop vectorization experimental for >> now. But if it receives positive feedback, we would like to change it to >> non-experimental in the future. > > I haven't looked at the code yet but just gave this a quick run through our testing. I'm seeing several hundred failures: > > java.lang.ClassNotFoundException: compiler/vectorization/runner/ArrayCopyTest > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:383) > at java.base/java.lang.Class.forName(Class.java:376) > at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:183) > at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) > at java.base/java.lang.Thread.run(Thread.java:833) > java.lang.RuntimeException: Cannot create test instance for class compiler/vectorization/runner/ArrayCopyTest > at compiler.vectorization.runner.VectorizationTestRunner.fail(VectorizationTestRunner.java:195) > at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:188) > at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) > at java.base/java.lang.Thread.run(Thread.java:833) > > Similar ClassNotFoundExceptions happen with the other tests. The new tests also intermittently time out with `-Xcomp`. > > > java.lang.RuntimeException: Test failed in compiler.vectorization.runner.ArrayIndexFillTest.fillByteArray: Method is not compiled after 30s. > at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:72) > at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:200) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) > at java.base/java.lang.Thread.run(Thread.java:833) > > > Running with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`. Similar failures happen with the other tests. > > There are also failures in the pre-submit tests. Hi @TobiHartmann , thanks for your test work. I have already noticed the failure issues. So far, all the failures I have found are under `compiler/vectorization/runner` - these are new tests added by me from this patch. Cause is that in my new test framework, I use `WhiteBox` APIs to do compilation level control for the correctness check. But it may not work if additional compiler control VM options are specified. I will fix it soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/6828 From dlong at openjdk.java.net Wed Dec 15 08:10:00 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 15 Dec 2021 08:10:00 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 05:59:45 GMT, Vladimir Kozlov wrote: > A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. > > It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. > > Added new regression test. Tested tier1-3. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From pli at openjdk.java.net Wed Dec 15 09:21:39 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Wed, 15 Dec 2021 09:21:39 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization [v2] In-Reply-To: References: Message-ID: > ### Background > > Post loop vectorization is a C2 compiler optimization in an experimental > VM feature called PostLoopMultiversioning. It transforms the range-check > eliminated post loop to a 1-iteration vectorized loop with vector mask. > This optimization was contributed by Intel in 2016 to support x86 AVX512 > masked vector instructions. However, it was disabled soon after an issue > was found. Due to insufficient maintenance in these years, multiple bugs > have been accumulated inside. But we (Arm) still think this is a useful > framework for vector mask support in C2 auto-vectorized loops, for both > x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable > post loop vectorization. > > ### Changes in this patch > > This patch reworks post loop vectorization. The most significant change > is removing vector mask support in C2 x86 backend and re-implementing > it in the mid-end. With this, we can re-enable post loop vectorization > for platforms other than x86. > > Previous implementation hard-codes x86 k1 register as a reserved AVX512 > opmask register and defines two routines (setvectmask/restorevectmask) > to set and restore the value of k1. But after [JDK-8211251](https://bugs.openjdk.java.net/browse/JDK-8211251) which encodes > AVX512 instructions as unmasked by default, generated vector masks are > no longer used in AVX512 vector instructions. To fix incorrect codegen > and add vector mask support for more platforms, we turn to add a vector > mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode > to generate a mask and replace all Load/Store nodes in the post loop > into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This > IR form is exactly the same to those which are used in VectorAPI mask > support. For now, we only add mask inputs for Load/Store nodes because > we don't have reduction operations supported in post loop vectorization. > After this change, the x86 k1 register is no longer reserved and can be > allocated when PostLoopMultiversioning is enabled. > > Besides this change, we have fixed a compiler crash and five incorrect > result issues with post loop vectorization. > > **I) C2 crashes with segmentation fault in strip-mined loops** > > Previous implementation was done before C2 loop strip-mining was merged > into JDK master so it didn't take strip-mined loops into consideration. > In C2's strip mined loops, post loop is not the sibling of the main loop > in ideal loop tree. Instead, it's the sibling of the main loop's parent. > This patch fixed a SIGSEGV issue caused by NULL pointer when locating > post loop from strip-mined main loop. > > **II) Incorrect result issues with post loop vectorization** > > We have also fixed five incorrect vectorization issues. Some of them are > hidden deep and can only be reproduced with corner cases. These issues > have a common cause that it assumes the post loop can be vectorized if > the vectorization in corresponding main loop is successful. But in many > cases this assumption is wrong. Below are details. > > - **[Issue-1] Incorrect vectorization for partial vectorizable loops** > > This issue can be reproduced by below loop where only some operations in > the loop body are vectorizable. > > for (int i = 0; i < 10000; i++) { > res[i] = a[i] * b[i]; > k = 3 * k + 1; > } > > In the main loop, superword can work well if parts of the operations in > loop body are not vectorizable since those parts can be unrolled only. > But for post loops, we don't create vectors through combining scalar IRs > generated from loop unrolling. Instead, we are doing scalars to vectors > replacement for all operations in the loop body. Hence, all operations > should be either vectorized together or not vectorized at all. To fix > this kind of cases, we add an extra field "_slp_vector_pack_count" in > CountedLoopNode to record the eventual count of vector packs in the main > loop. This value is then passed to post loop and compared with post loop > pack count. Vectorization will be bailed out in post loop if it creates > more vector packs than in the main loop. > > - **[Issue-2] Incorrect result in loops with growing-down vectors** > > This issue appears with growing-down vectors, that is, vectors that grow > to smaller memory address as the loop iterates. It can be reproduced by > below counting-up loop with negative scale value in array index. > > for (int i = 0; i < 10000; i++) { > a[MAX - i] = b[MAX - i]; > } > > Cause of this issue is that for a growing-down vector, generated vector > mask value has reversed vector-lane order so it masks incorrect vector > lanes. Note that if negative scale value appears in counting-down loops, > the vector will be growing up. With this rule, we fix the issue by only > allowing positive array index scales in counting-up loops and negative > array index scales in counting-down loops. This check is done with the > help of SWPointer by comparing scale values in each memory access in the > loop with loop stride value. > > - **[Issue-3] Incorrect result in manually unrolled loops** > > This issue can be reproduced by below manually unrolled loop. > > for (int i = 0; i < 10000; i += 2) { > c[i] = a[i] + b[i]; > c[i + 1] = a[i + 1] * b[i + 1]; > } > > In this loop, operations in the 2nd statement duplicate those in the 1st > statement with a small memory address offset. Vectorization in the main > loop works well in this case because C2 does further unrolling and pack > combination. But we cannot vectorize the post loop through replacement > from scalars to vectors because it creates duplicated vector operations. > To fix this, we restrict post loop vectorization to loops with stride > values of 1 or -1. > > - **[Issue-4] Incorrect result in loops with mixed vector element sizes** > > This issue is found after we enable post loop vectorization for AArch64. > It's reproducible by multiple array operations with different element > sizes inside a loop. On x86, there is no issue because the values of x86 > AVX512 opmasks only depend on which vector lanes are active. But AArch64 > is different - the values of SVE predicates also depend on lane size of > the vector. Hence, on AArch64 SVE, if a loop has mixed vector element > sizes, we should use different vector masks. For now, we just support > loops with only one vector element size, i.e., "int + float" vectors in > a single loop is ok but "int + double" vectors in a single loop is not > vectorizable. This fix also enables subword vectors support to make all > primitive type array operations vectorizable. > > - **[Issue-5] Incorrect result in loops with potential data dependence** > > This issue can be reproduced by below corner case on AArch64 only. > > for (int i = 0; i < 10000; i++) { > a[i] = x; > a[i + OFFSET] = y; > } > > In this case, two stores in the loop have data dependence if the OFFSET > value is smaller than the vector length. So we cannot do vectorization > through replacing scalars to vectors. But the main loop vectorization > in this case is successful on AArch64 because AArch64 has partial vector > load/store support. It splits vector fill with different values in lanes > to several smaller-sized fills. In this patch, we add additional data > dependence check for this kind of cases. The check is also done with the > help of SWPointer class. In this check, we require that every two memory > accesses (with at least one store) of the same element type (or subword > size) in the loop has the same array index expression. > > ### Tests > > So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with > experimental VM option "PostLoopMultiversioning" turned on. We found no > issue in all tests. We notice that those existing cases are not enough > because some of above issues are not spotted by them. We would like to > add some new cases but we found existing vectorization tests are a bit > cumbersome - golden results must be pre-calculated and hard-coded in the > test code for correctness verification. Thus, in this patch, we propose > a new vectorization testing framework. > > Our new framework brings a simpler way to add new cases. For a new test > case, we only need to create a new method annotated with "@Test". The > test runner will invoke each annotated method twice automatically. First > time it runs in the interpreter and second time it's forced compiled by > C2. Then the two return results are compared. So in this framework each > test method should return a primitive value or an array of primitives. > In this way, no extra verification code for vectorization correctness is > required. This test runner is still jtreg-based and takes advantages of > the jtreg WhiteBox API, which enables test methods running at specific > compilation levels. Each test class inside is also jtreg-based. It just > need to inherit from the test runner class and run with two additional > options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". > > ### Summary & Future work > > In this patch, we reworked post loop vectorization. We made it platform > independent and fixed several issues inside. We also implemented a new > vectorization testing framework with many test cases inside. Meanwhile, > we did some code cleanups. > > This patch only touches C2 code guarded with PostLoopMultiversioning, > except a few data structure changes. So, there's no behavior change when > experimental VM option PostLoopMultiversioning is off. Also, to reduce > risks, we still propose to keep post loop vectorization experimental for > now. But if it receives positive feedback, we would like to change it to > non-experimental in the future. Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fix issues in newly added test framework Change-Id: I6e61abf05e9665325cb3abaf407360b18355c6b1 - Merge branch 'master' into postloop Change-Id: I9bb5a808d7540426dedb141fd198d25eb1f569e6 - 8183390: Fix and re-enable post loop vectorization ** Background Post loop vectorization is a C2 compiler optimization in an experimental VM feature called PostLoopMultiversioning. It transforms the range-check eliminated post loop to a 1-iteration vectorized loop with vector mask. This optimization was contributed by Intel in 2016 to support x86 AVX512 masked vector instructions. However, it was disabled soon after an issue was found. Due to insufficient maintenance in these years, multiple bugs have been accumulated inside. But we (Arm) still think this is a useful framework for vector mask support in C2 auto-vectorized loops, for both x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable post loop vectorization. ** Changes in this patch This patch reworks post loop vectorization. The most significant change is removing vector mask support in C2 x86 backend and re-implementing it in the mid-end. With this, we can re-enable post loop vectorization for platforms other than x86. Previous implementation hard-codes x86 k1 register as a reserved AVX512 opmask register and defines two routines (setvectmask/restorevectmask) to set and restore the value of k1. But after JDK-8211251 which encodes AVX512 instructions as unmasked by default, generated vector masks are no longer used in AVX512 vector instructions. To fix incorrect codegen and add vector mask support for more platforms, we turn to add a vector mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode to generate a mask and replace all Load/Store nodes in the post loop into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This IR form is exactly the same to those which are used in VectorAPI mask support. For now, we only add mask inputs for Load/Store nodes because we don't have reduction operations supported in post loop vectorization. After this change, the x86 k1 register is no longer reserved and can be allocated when PostLoopMultiversioning is enabled. Besides this change, we have fixed a compiler crash and five incorrect result issues with post loop vectorization. - 1) C2 crashes with segmentation fault in strip-mined loops Previous implementation was done before C2 loop strip-mining was merged into JDK master so it didn't take strip-mined loops into consideration. In C2's strip mined loops, post loop is not the sibling of the main loop in ideal loop tree. Instead, it's the sibling of the main loop's parent. This patch fixed a SIGSEGV issue caused by NULL pointer when locating post loop from strip-mined main loop. - 2) Incorrect result issues with post loop vectorization We have also fixed five incorrect vectorization issues. Some of them are hidden deep and can only be reproduced with corner cases. These issues have a common cause that it assumes the post loop can be vectorized if the vectorization in corresponding main loop is successful. But in many cases this assumption is wrong. Below are details. [Issue-1] Incorrect vectorization for partial vectorizable loops This issue can be reproduced by below loop where only some operations in the loop body are vectorizable. for (int i = 0; i < 10000; i++) { res[i] = a[i] * b[i]; k = 3 * k + 1; } In the main loop, superword can work well if parts of the operations in loop body are not vectorizable since those parts can be unrolled only. But for post loops, we don't create vectors through combining scalar IRs generated from loop unrolling. Instead, we are doing scalars to vectors replacement for all operations in the loop body. Hence, all operations should be either vectorized together or not vectorized at all. To fix this kind of cases, we add an extra field "_slp_vector_pack_count" in CountedLoopNode to record the eventual count of vector packs in the main loop. This value is then passed to post loop and compared with post loop pack count. Vectorization will be bailed out in post loop if it creates more vector packs than in the main loop. [Issue-2] Incorrect result in loops with growing-down vectors This issue appears with growing-down vectors, that is, vectors that grow to smaller memory address as the loop iterates. It can be reproduced by below counting-up loop with negative scale value in array index. for (int i = 0; i < 10000; i++) { a[MAX - i] = b[MAX - i]; } Cause of this issue is that for a growing-down vector, generated vector mask value has reversed vector-lane order so it masks incorrect vector lanes. Note that if negative scale value appears in counting-down loops, the vector will be growing up. With this rule, we fix the issue by only allowing positive array index scales in counting-up loops and negative array index scales in counting-down loops. This check is done with the help of SWPointer by comparing scale values in each memory access in the loop with loop stride value. [Issue-3] Incorrect result in manually unrolled loops This issue can be reproduced by below manually unrolled loop. for (int i = 0; i < 10000; i += 2) { c[i] = a[i] + b[i]; c[i + 1] = a[i + 1] * b[i + 1]; } In this loop, operations in the 2nd statement duplicate those in the 1st statement with a small memory address offset. Vectorization in the main loop works well in this case because C2 does further unrolling and pack combination. But we cannot vectorize the post loop through replacement from scalars to vectors because it creates duplicated vector operations. To fix this, we restrict post loop vectorization to loops with stride values of 1 or -1. [Issue-4] Incorrect result in loops with mixed vector element sizes This issue is found after we enable post loop vectorization for AArch64. It's reproducible by multiple array operations with different element sizes inside a loop. On x86, there is no issue because the values of x86 AVX512 opmasks only depend on which vector lanes are active. But AArch64 is different - the values of SVE predicates also depend on lane size of the vector. Hence, on AArch64 SVE, if a loop has mixed vector element sizes, we should use different vector masks. For now, we just support loops with only one vector element size, i.e., "int + float" vectors in a single loop is ok but "int + double" vectors in a single loop is not vectorizable. This fix also enables subword vectors support to make all primitive type array operations vectorizable. [Issue-5] Incorrect result in loops with potential data dependence This issue can be reproduced by below corner case on AArch64 only. for (int i = 0; i < 10000; i++) { a[i] = x; a[i + OFFSET] = y; } In this case, two stores in the loop have data dependence if the OFFSET value is smaller than the vector length. So we cannot do vectorization through replacing scalars to vectors. But the main loop vectorization in this case is successful on AArch64 because AArch64 has partial vector load/store support. It splits vector fill with different values in lanes to several smaller-sized fills. In this patch, we add additional data dependence check for this kind of cases. The check is also done with the help of SWPointer class. In this check, we require that every two memory accesses (with at least one store) of the same element type (or subword size) in the loop has the same array index expression. ** Tests So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with experimental VM option "PostLoopMultiversioning" turned on. We found no issue in all tests. We notice that those existing cases are not enough because some of above issues are not spotted by them. We would like to add some new cases but we found existing vectorization tests are a bit cumbersome - golden results must be pre-calculated and hard-coded in the test code for correctness verification. Thus, in this patch, we propose a new vectorization testing framework. Our new framework brings a simpler way to add new cases. For a new test case, we only need to create a new method annotated with "@Test". The test runner will invoke each annotated method twice automatically. First time it runs in the interpreter and second time it's forced compiled by C2. Then the two return results are compared. So in this framework each test method should return a primitive value or an array of primitives. In this way, no extra verification code for vectorization correctness is required. This test runner is still jtreg-based and takes advantages of the jtreg WhiteBox API, which enables test methods running at specific compilation levels. Each test class inside is also jtreg-based. It just need to inherit from the test runner class and run with two additional options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". ** Summary & Future work In this patch, we reworked post loop vectorization. We made it platform independent and fixed several issues inside. We also implemented a new vectorization testing framework with many test cases inside. Meanwhile, we did some code cleanups. This patch only touches C2 code guarded with PostLoopMultiversioning, except a few data structure changes. So, there's no behavior change when experimental VM option PostLoopMultiversioning is off. Also, to reduce risks, we still propose to keep post loop vectorization experimental for now. But if it receives positive feedback, we would like to change it to non-experimental in the future. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6828/files - new: https://git.openjdk.java.net/jdk/pull/6828/files/cae3b16b..85ce597d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6828&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6828&range=00-01 Stats: 922 lines in 75 files changed: 688 ins; 84 del; 150 mod Patch: https://git.openjdk.java.net/jdk/pull/6828.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6828/head:pull/6828 PR: https://git.openjdk.java.net/jdk/pull/6828 From aph at openjdk.java.net Wed Dec 15 10:40:00 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 15 Dec 2021 10:40:00 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. > > - Interleaved ISO and ASCII check code. > - Avoid 'umaxv' in the ISO main flow. > - Using post inc in main loop. > - Retain 8-char loop. > - Removing conditional prefetch (no upside). > - Adding ISO-8859-1 to encode-decode benchmark. > > Testing: tier1-6 > > The revised version compares like this (master vs. update). > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op I don't think this should go straight into the 18 release branch. It looks OK for mainline. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From duke at openjdk.java.net Wed Dec 15 12:21:17 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Wed, 15 Dec 2021 12:21:17 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build Message-ID: After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. Checked that tests are not affected. Checked on Aurora that performance is not affected. ------------- Commit messages: - missing #ifndef PRODUCT added - restore PrintDeoptimizationDetails as develop flag - remove UnlockDiagnosticVMOptions duplicate in TestDeoptOOM.java - Update TestDeoptOMM.cpp flags - UnlockDiagnosticVMOptions in TestDeoptOOM.java - Merge remote-tracking branch 'origin/master' into JDK-8278329 - fixed some formatting of PrintDeoptimizationDetails and TraceDeoptimization - JDK-8278329: some TraceDeoptimization code not included in PRODUCT build Changes: https://git.openjdk.java.net/jdk/pull/6746/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6746&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278329 Stats: 118 lines in 4 files changed: 45 ins; 54 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/6746.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6746/head:pull/6746 PR: https://git.openjdk.java.net/jdk/pull/6746 From jbhateja at openjdk.java.net Wed Dec 15 13:36:58 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 15 Dec 2021 13:36:58 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization In-Reply-To: <8xK2PmI8ViNhtdSJhC2OEHjbLEc2rKCQRBAFGLw13G8=.ec7e1206-98b3-4777-a1aa-9247dfcb7bd6@github.com> References: <8xK2PmI8ViNhtdSJhC2OEHjbLEc2rKCQRBAFGLw13G8=.ec7e1206-98b3-4777-a1aa-9247dfcb7bd6@github.com> Message-ID: On Wed, 15 Dec 2021 07:25:09 GMT, Pengfei Li wrote: >> I haven't looked at the code yet but just gave this a quick run through our testing. I'm seeing several hundred failures: >> >> java.lang.ClassNotFoundException: compiler/vectorization/runner/ArrayCopyTest >> at java.base/java.lang.Class.forName0(Native Method) >> at java.base/java.lang.Class.forName(Class.java:383) >> at java.base/java.lang.Class.forName(Class.java:376) >> at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:183) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:577) >> at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) >> at java.base/java.lang.Thread.run(Thread.java:833) >> java.lang.RuntimeException: Cannot create test instance for class compiler/vectorization/runner/ArrayCopyTest >> at compiler.vectorization.runner.VectorizationTestRunner.fail(VectorizationTestRunner.java:195) >> at compiler.vectorization.runner.VectorizationTestRunner.createTestInstance(VectorizationTestRunner.java:188) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:199) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:577) >> at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) >> at java.base/java.lang.Thread.run(Thread.java:833) >> >> Similar ClassNotFoundExceptions happen with the other tests. The new tests also intermittently time out with `-Xcomp`. >> >> >> java.lang.RuntimeException: Test failed in compiler.vectorization.runner.ArrayIndexFillTest.fillByteArray: Method is not compiled after 30s. >> at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:72) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:200) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:577) >> at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) >> at java.base/java.lang.Thread.run(Thread.java:833) >> >> >> Running with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`. Similar failures happen with the other tests. >> >> There are also failures in the pre-submit tests. > > Hi @TobiHartmann , thanks for your test work. I have already noticed the failure issues. So far, all the failures I have found are under `compiler/vectorization/runner` - these are new tests added by me from this patch. Cause is that in my new test framework, I use `WhiteBox` APIs to do compilation level control for the correctness check. But it may not work if additional compiler control VM options are specified. I will fix it soon. Hi @pfustc , thanks will check the behavior on AVX-512 target with your patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/6828 From coleenp at openjdk.java.net Wed Dec 15 13:49:01 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Dec 2021 13:49:01 GMT Subject: RFR: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation [v2] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 17:42:00 GMT, Coleen Phillimore wrote: >> This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. >> >> Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: >> linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug >> and >> linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug >> >> Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Added an initialization assert. Thanks for the re-review, David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From coleenp at openjdk.java.net Wed Dec 15 13:49:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Dec 2021 13:49:02 GMT Subject: Integrated: 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation In-Reply-To: References: Message-ID: On Mon, 13 Dec 2021 23:14:43 GMT, Coleen Phillimore wrote: > This change makes VM_Version_Ext part of VM_Version (the platform dependent part) and moves some duplicated code. x86 had the most code in VM_Version_Ext, so the most code moved there. There might be some unneeded functions but I didn't want to remove them with this change. > > Tier1 (tier2-4 testing in progress) on linux and windows for x86, aarch64, Oracle platforms and tested builds on: > linux-aarch64-debug,linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug > and > linux-x64-zero,linux-x64-zero-debug,linux-x86-zero,linux-x86-zero-debug > > Ran JFR tests manually (it uses os_perf* CPUInformationInterface code). This pull request has now been integrated. Changeset: 1e3ae3be Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/1e3ae3be02e1fa76c632ef289dd1887c7fa369ec Stats: 2976 lines in 33 files changed: 1109 ins; 1830 del; 37 mod 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation Reviewed-by: dholmes, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/6820 From phedlin at openjdk.java.net Wed Dec 15 14:12:59 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 15 Dec 2021 14:12:59 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: <5EmZGX3oIMMll_eKwWVNV1xkPCQRwXjnb_QMRb2tcjA=.c00aa130-d0ef-49bb-b513-32e4709b28cb@github.com> On Wed, 15 Dec 2021 10:37:04 GMT, Andrew Haley wrote: > I don't think this should go straight into the 18 release branch. It looks OK for mainline. Any particular reason it should not be included in JDK-18? ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From dnsimon at openjdk.java.net Wed Dec 15 14:19:04 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 15 Dec 2021 14:19:04 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 14:46:05 GMT, Tobias Holenstein wrote: > After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. > > Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. > > The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. > > Checked that tests are not affected. Checked on Aurora that performance is not affected. Marked as reviewed by dnsimon (Committer). src/hotspot/share/runtime/deoptimization.cpp line 246: > 244: if (TraceDeoptimization) { > 245: tty->print_cr("SAVED OOP RESULT " INTPTR_FORMAT " in thread " INTPTR_FORMAT, p2i(result), p2i(thread)); > 246: tty->cr(); Is this `tty->cr()` necessary given the `print_cr` call above? src/hotspot/share/runtime/deoptimization.cpp line 734: > 732: tty->print_cr("DEOPT UNPACKING thread " INTPTR_FORMAT " vframeArray " INTPTR_FORMAT " mode %d", > 733: p2i(thread), p2i(array), exec_mode); > 734: tty->cr(); Same question as above about the necessity of `cr()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From aph at openjdk.java.net Wed Dec 15 15:03:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 15 Dec 2021 15:03:08 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: <5EmZGX3oIMMll_eKwWVNV1xkPCQRwXjnb_QMRb2tcjA=.c00aa130-d0ef-49bb-b513-32e4709b28cb@github.com> References: <5EmZGX3oIMMll_eKwWVNV1xkPCQRwXjnb_QMRb2tcjA=.c00aa130-d0ef-49bb-b513-32e4709b28cb@github.com> Message-ID: <2uKBlDzz5yUtlKD3lUXJj7nX1IW620fZomAKHzHEXvg=.da646d0d-a28f-4605-ab24-93a245a85e9b@github.com> On Wed, 15 Dec 2021 14:09:25 GMT, Patric Hedlin wrote: > > I don't think this should go straight into the 18 release branch. It looks OK for mainline. > > Any particular reason it should not be included in JDK-18? We're in RDP1 since December 9. This means that any enhancement is covered by the Late-Enhancement Request Process. https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process . I don't think this patch is urgent enough for that. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From phedlin at openjdk.java.net Wed Dec 15 15:11:08 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 15 Dec 2021 15:11:08 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: <2uKBlDzz5yUtlKD3lUXJj7nX1IW620fZomAKHzHEXvg=.da646d0d-a28f-4605-ab24-93a245a85e9b@github.com> References: <5EmZGX3oIMMll_eKwWVNV1xkPCQRwXjnb_QMRb2tcjA=.c00aa130-d0ef-49bb-b513-32e4709b28cb@github.com> <2uKBlDzz5yUtlKD3lUXJj7nX1IW620fZomAKHzHEXvg=.da646d0d-a28f-4605-ab24-93a245a85e9b@github.com> Message-ID: On Wed, 15 Dec 2021 14:59:47 GMT, Andrew Haley wrote: > We're in RDP1 since December 9. This means that any enhancement is covered by the Late-Enhancement Request Process. https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process . I don't think this patch is urgent enough for that. It has been classified as a performance regression (bug) in line with the x86 issue (JDK-8274242). Do you mean we should change this _now_? Aarch64 would be the only platform not to address the issue in JDK-18. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From aph at openjdk.java.net Wed Dec 15 15:27:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 15 Dec 2021 15:27:06 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: <5EmZGX3oIMMll_eKwWVNV1xkPCQRwXjnb_QMRb2tcjA=.c00aa130-d0ef-49bb-b513-32e4709b28cb@github.com> <2uKBlDzz5yUtlKD3lUXJj7nX1IW620fZomAKHzHEXvg=.da646d0d-a28f-4605-ab24-93a245a85e9b@github.com> Message-ID: On Wed, 15 Dec 2021 15:07:41 GMT, Patric Hedlin wrote: > > We're in RDP1 since December 9. This means that any enhancement is covered by the Late-Enhancement Request Process. https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process . I don't think this patch is urgent enough for that. > > It has been classified as a performance regression (bug) in line with the x86 issue (JDK-8274242). Do you mean we should change this _now_? Aarch64 would be the only platform not to address the issue in JDK-18. I see your point, and that is a significant difference. It's unfortunate that AArch64 got this patch late in the process, but as it's a bug, and Java performance isn't supposed to regress in a release, it does make sense to fix it. Having said that, I've had a lot of bad experiences with late patches, and it'd be a dreadful shame if we broke AArch64 string handling. I suppose the best way to proceed with this is to have two AArch64 reviewers go through the patch instruction by instruction, just to make sure the test covers all corner cases. Thanks. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From rriggs at openjdk.java.net Wed Dec 15 16:12:56 2021 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 15 Dec 2021 16:12:56 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > The motivation is found in the original x86 issue ([JDK-8274242](https://bugs.openjdk.java.net/browse/JDK-8274242)). > > Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. > > - Interleaved ISO and ASCII check code. > - Avoid 'umaxv' in the ISO main flow. > - Using post inc in main loop. > - Retain 8-char loop. > - Removing conditional prefetch (no upside). > - Adding ISO-8859-1 to encode-decode benchmark. > > Testing: tier1-6 > > The revised version compares like this (master vs. update). > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op Its viable to commit to the main line and allow it to have some bake time before requesting it to be backported. That would allow some time to build confidence about the change; it might not make the first JDK 18 release but would come in later. There is no problem requesting approval for the change but it should go through that process. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From coleenp at openjdk.java.net Wed Dec 15 16:22:18 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Dec 2021 16:22:18 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Message-ID: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. Tested with tier1-3. ------------- Commit messages: - Move has_resolved_methods to access flags so can be set and tested concurretly. - Move has_resolved_methods to access flags so can be set and tested concurretly. - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Changes: https://git.openjdk.java.net/jdk/pull/6851/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6851&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277216 Stats: 32 lines in 2 files changed: 10 ins; 17 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6851.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6851/head:pull/6851 PR: https://git.openjdk.java.net/jdk/pull/6851 From coleenp at openjdk.java.net Wed Dec 15 16:55:32 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Dec 2021 16:55:32 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v2] In-Reply-To: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: > Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix typo. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6851/files - new: https://git.openjdk.java.net/jdk/pull/6851/files/61c391aa..8f2406b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6851&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6851&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6851.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6851/head:pull/6851 PR: https://git.openjdk.java.net/jdk/pull/6851 From phh at openjdk.java.net Wed Dec 15 17:01:04 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 15 Dec 2021 17:01:04 GMT Subject: RFR: 8278241: Implement JVM SpinPause on linux-aarch64 [v5] In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 16:01:38 GMT, Evgeny Astigeevich wrote: >> This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. >> >> The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. >> >> Testing results for fastdebug and release builds: >> - `gtest`: Passed >> - `tier1`...`tier4`: Passed >> - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed >> >> JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. >> >> Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: >> >> >> >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ >> | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | >> | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | >> | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | >> | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | >> | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | >> | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | >> | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | >> +-----------+-------------------+------------+-----------+-----------+----------+---------+ > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant check and guarantee non-null spin_wait Marked as reviewed by phh (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From duke at openjdk.java.net Wed Dec 15 17:01:06 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 15 Dec 2021 17:01:06 GMT Subject: Integrated: 8278241: Implement JVM SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: On Fri, 10 Dec 2021 21:02:53 GMT, Evgeny Astigeevich wrote: > This JVM SpinPause uses a spin wait stub. The stub is generated based on the `SpinWait` description which is defined with `OnSpinWaitInst`/`OnSpinWaitInstCount` options. The `SpinWait` provides the description of the instruction and the instruction count. > > The `SpinWait` description is also used for the `_onSpinWait()` intrinsic. We don't have use cases when we need different implementations for the `_onSpinWait()` intrinsic and JVM SpinPause. > > Testing results for fastdebug and release builds: > - `gtest`: Passed > - `tier1`...`tier4`: Passed > - `hotspot/jtreg/runtime/Thread/TestSpinPause.java`: Passed > > JVM SpinPause is used for the synchronised statements and can benchmarked with `org.openjdk.bench.vm.lang.LockUnlock.testContendedLock`. > > Benchmarking results (number of samples per an experiment: 150) for Graviton2 (Neoverse N1), 1 ISB instruction: > > > > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | CPU cores | Contended threads | Base ns/op | Error | New | Error | Diff | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ > | 8 | 64 | 10007.213 | ?910.911 | 8527.346 | ?377.242 | -14.79% | > | 16 | 64 | 10274.935 | ?880.568 | 8310.433 | ?326.845 | -19.12% | > | 32 | 64 | 12231.947 | ?1525.364 | 9205.941 | ?394.409 | -24.74% | > | 64 | 64 | 9929.49 | ?586.074 | 10488.695 | ?570.458 | 5.63% | > | 64 | 32 | 5605.119 | ?629.340 | 5023.882 | ?230.639 | -10.37% | > | 64 | 16 | 2817.346 | ?263.696 | 2367.528 | ?94.158 | -15.97% | > | 64 | 2 | 870.389 | ?530.579 | 464.395 | ?126.260 | -46.65% | > +-----------+-------------------+------------+-----------+-----------+----------+---------+ This pull request has now been integrated. Changeset: bcb79fd0 Author: Evgeny Astigeevich Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/bcb79fd012c9c298e58c20c59e564e9d2c16b970 Stats: 125 lines in 5 files changed: 124 ins; 0 del; 1 mod 8278241: Implement JVM SpinPause on linux-aarch64 Reviewed-by: aph, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/6803 From mchung at openjdk.java.net Wed Dec 15 17:19:00 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 15 Dec 2021 17:19:00 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 05:59:45 GMT, Vladimir Kozlov wrote: > A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. > > It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. > > Added new regression test. Tested tier1-3. Looks good. Thanks for taking this on. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/27 From kvn at openjdk.java.net Wed Dec 15 17:19:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 17:19:02 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: <2Kb5Q961w7Kw69AeNYu840nXSIoTi2lAlsZyKZk3mis=.9f063cd0-f593-48f6-bdb2-f99c52529cf0@github.com> References: <2Kb5Q961w7Kw69AeNYu840nXSIoTi2lAlsZyKZk3mis=.9f063cd0-f593-48f6-bdb2-f99c52529cf0@github.com> Message-ID: On Wed, 15 Dec 2021 07:00:17 GMT, David Holmes wrote: >> A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. >> >> It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. >> >> Added new regression test. Tested tier1-3. > > src/hotspot/share/oops/method.cpp line 827: > >> 825: */ >> 826: bool Method::can_omit_stack_trace() { >> 827: if (method_holder()->class_loader_data()->is_boot_class_loader_data()) { > > Do you actually need to check this? @dholmes-ora, thank you for looking on it. I discussed it with Mandy and agreed that we need to narrow down this workaround as much as possible. That is why it is done only for system class loaded by null loader. ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From mchung at openjdk.java.net Wed Dec 15 17:22:55 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 15 Dec 2021 17:22:55 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: <2Kb5Q961w7Kw69AeNYu840nXSIoTi2lAlsZyKZk3mis=.9f063cd0-f593-48f6-bdb2-f99c52529cf0@github.com> Message-ID: On Wed, 15 Dec 2021 17:15:46 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/oops/method.cpp line 827: >> >>> 825: */ >>> 826: bool Method::can_omit_stack_trace() { >>> 827: if (method_holder()->class_loader_data()->is_boot_class_loader_data()) { >> >> Do you actually need to check this? > > @dholmes-ora, thank you for looking on it. > I discussed it with Mandy and agreed that we need to narrow down this workaround as much as possible. That is why it is done only for system class loaded by null loader. David has a good observation. There will be no split package for modules. So sun.invoke.util classes will only be loaded from java.base. The boot loader is not needed. ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From kvn at openjdk.java.net Wed Dec 15 17:48:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 17:48:00 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: <2Kb5Q961w7Kw69AeNYu840nXSIoTi2lAlsZyKZk3mis=.9f063cd0-f593-48f6-bdb2-f99c52529cf0@github.com> Message-ID: <3OyMsiLC_WdcDg3mFI_rJN1m-_ahV2uB6mzzXriQ1Lg=.dd9eaa23-ebb0-4a0a-9aaf-6c384582fcba@github.com> On Wed, 15 Dec 2021 17:19:25 GMT, Mandy Chung wrote: >> @dholmes-ora, thank you for looking on it. >> I discussed it with Mandy and agreed that we need to narrow down this workaround as much as possible. That is why it is done only for system class loaded by null loader. > > David has a good observation. There will be no split package for modules. So sun.invoke.util classes will only be loaded from java.base. The boot loader is not needed. Okay, I will remove it. ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From kvn at openjdk.java.net Wed Dec 15 17:56:23 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 17:56:23 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation [v2] In-Reply-To: References: Message-ID: > A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. > > It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. > > Added new regression test. Tested tier1-3. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Removed boot classloader check ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/27/files - new: https://git.openjdk.java.net/jdk18/pull/27/files/3c23350d..5b6805b6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=27&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=27&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk18/pull/27.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/27/head:pull/27 PR: https://git.openjdk.java.net/jdk18/pull/27 From svkamath at openjdk.java.net Wed Dec 15 17:56:52 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 15 Dec 2021 17:56:52 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Thanks for your comments Vladimir. I will make the change and move the check. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From mchung at openjdk.java.net Wed Dec 15 19:17:53 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 15 Dec 2021 19:17:53 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation [v2] In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 17:56:23 GMT, Vladimir Kozlov wrote: >> A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. >> >> It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. >> >> Added new regression test. Tested tier1-3. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Removed boot classloader check Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From coleenp at openjdk.java.net Wed Dec 15 20:06:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Dec 2021 20:06:41 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: > Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. > Tested with tier1-3. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into misc - Fix typo. - Move has_resolved_methods to access flags so can be set and tested concurretly. - Move has_resolved_methods to access flags so can be set and tested concurretly. - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6851/files - new: https://git.openjdk.java.net/jdk/pull/6851/files/8f2406b7..02e8da79 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6851&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6851&range=01-02 Stats: 4124 lines in 99 files changed: 1969 ins; 1951 del; 204 mod Patch: https://git.openjdk.java.net/jdk/pull/6851.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6851/head:pull/6851 PR: https://git.openjdk.java.net/jdk/pull/6851 From iklam at openjdk.java.net Wed Dec 15 20:11:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 20:11:00 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> Message-ID: On Wed, 15 Dec 2021 05:35:26 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> added comments about the location of vtable_len > > I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. > > Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. Thanks @dholmes-ora @vnkozlov @tstuefe for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From iklam at openjdk.java.net Wed Dec 15 20:11:01 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Dec 2021 20:11:01 GMT Subject: Integrated: 8278020: ~13% variation in Renaissance-Scrabble In-Reply-To: References: Message-ID: <4ec0Lfk_j0ey3a67srJRmNRQE4HTDZT_2PbDdB46NXQ=.10aa9e39-30a6-4ed1-8ce9-f269605112ee@github.com> On Tue, 14 Dec 2021 18:45:55 GMT, Ioi Lam wrote: > We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other. > > When CDS is disabled, we do not see such variations. > > In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). > > We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects. > > My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore. > > As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size. > > I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks. This pull request has now been integrated. Changeset: 4ba980ba Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/4ba980ba439f94a6b5015e64382a6c308476d63f Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod 8278020: ~13% variation in Renaissance-Scrabble Reviewed-by: dholmes, stuefe, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From kvn at openjdk.java.net Wed Dec 15 21:11:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 21:11:00 GMT Subject: RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2] In-Reply-To: References: <4zs58NwIBnL4hCUf9jmvCUF9NbKpybFMmiAVcwxx-JM=.1988a7ce-97d9-4bc7-8d95-4aa326cbf691@github.com> <8Li5sXlZkk1OleY6Q5RKpDTdOQSTenCUfci5qfr9xJw=.ddce64ee-0481-4623-808d-caa33829ad8d@github.com> Message-ID: On Wed, 15 Dec 2021 06:02:35 GMT, Vladimir Kozlov wrote: >> I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. >> >> Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. > >> I approved but still think this should be targeted at 18 - assuming this was a performance regression in 18. >> >> Padding may have the same performance affect but will also impact footprint potentially - which in turn may impact the caching behaviour. > > I suggested padding only as experiment to prove Ioi's theory. Current changes are good as the fix. >> @vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often? > Method counters? May be worth spreading them out better, or to pad them to prevent false sharing. I don't think compiled code updates something in Method. We need to look on fields layout. Compiled code do update frequently MethodCounters (invocations, loops) and MethodData (profiling counters for bytecode). Both are allocated in metaspace as classes. ------------- PR: https://git.openjdk.java.net/jdk/pull/6838 From dholmes at openjdk.java.net Wed Dec 15 21:31:02 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Dec 2021 21:31:02 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation [v2] In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 17:56:23 GMT, Vladimir Kozlov wrote: >> A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. >> >> It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. >> >> Added new regression test. Tested tier1-3. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Removed boot classloader check Looks good. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/27 From hseigel at openjdk.java.net Wed Dec 15 21:41:02 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 15 Dec 2021 21:41:02 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues The changes look good. Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6851 From kvn at openjdk.java.net Wed Dec 15 21:49:04 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 21:49:04 GMT Subject: [jdk18] RFR: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation [v2] In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 17:56:23 GMT, Vladimir Kozlov wrote: >> A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. >> >> It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. >> >> Added new regression test. Tested tier1-3. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Removed boot classloader check Thank you, David, Dean and Mandy for reviews. ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From kvn at openjdk.java.net Wed Dec 15 21:49:06 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 21:49:06 GMT Subject: [jdk18] Integrated: 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 05:59:45 GMT, Vladimir Kozlov wrote: > A proper fix for this is to use the catchException combination. However, that introduces significant cold startup performance regression. JDK-8278447 tracks the work to address the performance regression using catchException and asSpreader combinator. It may require significant work and refactoring which is risky for JDK 18. > > It is proposed to implement a workaround in C2 to white list the relevant methods (all methods in sun.invoke.util.ValueConversions class) not to omit stack trace when exception is thrown in them. > > Added new regression test. Tested tier1-3. This pull request has now been integrated. Changeset: d3408a46 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk18/commit/d3408a46b7c8c2f8b5e41f3e286a497064a2c104 Stats: 149 lines in 7 files changed: 147 ins; 1 del; 1 mod 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation Reviewed-by: dlong, mchung, dholmes ------------- PR: https://git.openjdk.java.net/jdk18/pull/27 From svkamath at openjdk.java.net Wed Dec 15 22:53:06 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 15 Dec 2021 22:53:06 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 20:05:59 GMT, Vladimir Kozlov wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Yes, we need to reexecute because code could be deoptimized during `new_array()` allocation. > But why we allocate this temp array in Java heap? Why not on stack in stub code? > > Also I noticed next return from intrinsics code could be moved up before we generate new nodes in graph: `if (Matcher::htbl_entries == -1) return false;` @vnkozlov Allocating the array in the stub will cause few changes on x86-64 side as well as change in aarch64 stubGenerator code as subkeyHtbl_48_entries will no longer be passed as an argument. Do let me know if you think it is okay to proceed with these changes. Thank you. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From kvn at openjdk.java.net Wed Dec 15 23:08:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 15 Dec 2021 23:08:10 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: <-plTHCWteo2rESpVYcx6lAh1sG4ofFm3b9qDQbvs28Q=.af21ab52-6f95-4e8e-b9dc-08caec4dd826@github.com> On Tue, 14 Dec 2021 20:05:59 GMT, Vladimir Kozlov wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Yes, we need to reexecute because code could be deoptimized during `new_array()` allocation. > But why we allocate this temp array in Java heap? Why not on stack in stub code? > > Also I noticed next return from intrinsics code could be moved up before we generate new nodes in graph: `if (Matcher::htbl_entries == -1) return false;` > @vnkozlov Allocating the array in the stub will cause few changes on x86-64 side as well as change in aarch64 stubGenerator code as subkeyHtbl_48_entries will no longer be passed as an argument. > Do let me know if you think it is okay to proceed with these changes. Thank you. @smita-kamath Thank you for looking on it. Okay, let proceed with your current fix for JDK 18. File RFE for JDK 19 to rework the code. Meanwhile I will test this fix. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From jwilhelm at openjdk.java.net Wed Dec 15 23:29:07 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 15 Dec 2021 23:29:07 GMT Subject: RFR: Merge jdk18 Message-ID: Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8277964: ClassCastException with no stack trace is thrown with -Xcomp in method handle invocation - 8272064: test/jdk/jdk/jfr/api/consumer/TestHiddenMethod.java needs update for JEP 416 - 8278607: Misc issues in foreign API javadoc - 8278233: [macos] tools/jpackage tests timeout due to /usr/bin/osascript - 8278758: runtime/BootstrapMethod/BSMCalledTwice.java fails with release VMs after JDK-8262134 - 8278744: KeyStore:getAttributes() not returning unmodifiable Set - 8277919: OldObjectSample event causing bloat in the class constant pool in JFR recording - 8262134: compiler/uncommontrap/TestDeoptOOM.java failed with "guarantee(false) failed: wrong number of expression stack elements during deopt" The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.java.net/jdk/pull/6856/files Stats: 341 lines in 27 files changed: 269 ins; 12 del; 60 mod Patch: https://git.openjdk.java.net/jdk/pull/6856.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6856/head:pull/6856 PR: https://git.openjdk.java.net/jdk/pull/6856 From duke at openjdk.java.net Thu Dec 16 00:03:23 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Thu, 16 Dec 2021 00:03:23 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() Message-ID: Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). ------------- Commit messages: - 8278868:Add x86 vectorization support for Long.bitCount() Changes: https://git.openjdk.java.net/jdk/pull/6857/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6857&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278868 Stats: 154 lines in 10 files changed: 152 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6857.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6857/head:pull/6857 PR: https://git.openjdk.java.net/jdk/pull/6857 From jwilhelm at openjdk.java.net Thu Dec 16 00:31:05 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 16 Dec 2021 00:31:05 GMT Subject: RFR: Merge jdk18 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 18 -> JDK 19 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: - Merge - 8278020: ~13% variation in Renaissance-Scrabble Reviewed-by: dholmes, stuefe, kvn - 8274898: Cleanup usages of StringBuffer in jdk tools modules Reviewed-by: sspitsyn, lmesnik - 8269838: BasicTypeDataBase.findDynamicTypeForAddress(addr, basetype) can be simplified Reviewed-by: kevinw, sspitsyn - 8278186: org.jcp.xml.dsig.internal.dom.Utils.parseIdFromSameDocumentURI throws StringIndexOutOfBoundsException when calling substring method Reviewed-by: mullan - 8278241: Implement JVM SpinPause on linux-aarch64 Reviewed-by: aph, phh - 8278842: Parallel: Remove unused VerifyObjectStartArrayClosure::_old_gen Reviewed-by: tschatzl - 8278548: G1: Remove unnecessary check in forward_to_block_containing_addr Reviewed-by: tschatzl, mli, sjohanss - 8202579: Revisit VM_Version and VM_Version_ext for overlap and consolidation Reviewed-by: dholmes, hseigel - 8278351: Add function to retrieve worker_id from any context Reviewed-by: eosterlund, kbarrett, ayang - ... and 43 more: https://git.openjdk.java.net/jdk/compare/6d63c6dd...fa3d80e6 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6856/files - new: https://git.openjdk.java.net/jdk/pull/6856/files/fa3d80e6..fa3d80e6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6856&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6856&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6856.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6856/head:pull/6856 PR: https://git.openjdk.java.net/jdk/pull/6856 From jwilhelm at openjdk.java.net Thu Dec 16 00:31:08 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 16 Dec 2021 00:31:08 GMT Subject: Integrated: Merge jdk18 In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 23:18:45 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: e6b28e05 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/e6b28e05c6f7698f230b04199932d4fc81f41a89 Stats: 341 lines in 27 files changed: 269 ins; 12 del; 60 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6856 From iklam at openjdk.java.net Thu Dec 16 04:05:25 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 16 Dec 2021 04:05:25 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes Message-ID: Cause of crash: When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. Fix: Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. Testing: I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** ------------- Commit messages: - cleaned up code - add #if INCLUDE_CDS - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug - using k->is_loader_alive() is enough - Added DumpTimeSharedClassTable::iterate() to make sure every iteration goes through EligibleClassIterationHelper - step1 Changes: https://git.openjdk.java.net/jdk/pull/6859/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278602 Stats: 104 lines in 6 files changed: 96 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6859/head:pull/6859 PR: https://git.openjdk.java.net/jdk/pull/6859 From dholmes at openjdk.java.net Thu Dec 16 04:53:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 16 Dec 2021 04:53:00 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Hi Coleen, I like the intent here but the assertions only partially work. Given something like: void set_has_nonstatic_fields(bool b) { assert(!has_nonstatic_fields(), "set once"); then you can still call the set function multiple times if passing false as the argument. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From dholmes at openjdk.java.net Thu Dec 16 04:59:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 16 Dec 2021 04:59:00 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: <88icA0lKRKAI39PfWecShRwPAFUJ93M0_c1zArJP92w=.02d04351-af03-40f4-8067-5bb2dfba3403@github.com> On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Could we not simply reserve the `_misc_flags` for only those properties set at class parsing and then consider them "frozen" with a new debug field `bool _misc_flags_frozen` that we can check in an assertion in the various set functions? ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From dholmes at openjdk.java.net Thu Dec 16 05:13:02 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 16 Dec 2021 05:13:02 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Or ... assuming these flags are always off by default and only ever turned on, then the set functions should not be taking any argument and should just simply set the given flag bit. Then the asserts would work fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From kvn at openjdk.java.net Thu Dec 16 07:26:57 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 16 Dec 2021 07:26:57 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Unfortunately I hit assert during CTW testing on Windows-x64 when compiling `com.sun.crypto.provider.GaloisCounterMode::implGCMCrypt`. # Internal Error (t:\workspace\open\src\hotspot\share\opto\graphKit.cpp:250), pid=41320, tid=43308 # assert(ex_map->jvms()->same_calls_as(_exceptions->jvms())) failed: all collected exceptions must come from the same place Current CompileTask: C2: 7834 2950 b 4 com.sun.crypto.provider.GaloisCounterMode::implGCMCrypt (100 bytes) Stack: [0x000000c86a600000,0x000000c86a700000] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xbb90d1] os::platform_print_native_stack+0xf1 (os_windows_x86.cpp:235) V [jvm.dll+0xdf904e] VMError::report+0x101e (vmError.cpp:828) V [jvm.dll+0xdfaa4e] VMError::report_and_die+0x7fe (vmError.cpp:1656) V [jvm.dll+0xdfb1d4] VMError::report_and_die+0x64 (vmError.cpp:1437) V [jvm.dll+0x536f47] report_vm_error+0xb7 (debug.cpp:280) V [jvm.dll+0x6da439] GraphKit::add_exception_states_from+0x119 (graphKit.cpp:286) V [jvm.dll+0x414375] PredicatedIntrinsicGenerator::generate+0x7f5 (callGenerator.cpp:1358) V [jvm.dll+0x5c3b62] Parse::do_call+0x9c2 (doCall.cpp:651) V [jvm.dll+0xbe3745] Parse::do_one_bytecode+0x32b5 (parse2.cpp:2704) V [jvm.dll+0xbd5ae7] Parse::do_one_block+0x437 (parse1.cpp:1557) V [jvm.dll+0xbd463c] Parse::do_all_blocks+0x5cc (parse1.cpp:710) V [jvm.dll+0xbd0c3d] Parse::Parse+0xc1d (parse1.cpp:616) V [jvm.dll+0x413a15] ParseGenerator::generate+0xa5 (callGenerator.cpp:103) V [jvm.dll+0x4eba90] Compile::Compile+0x1110 (compile.cpp:714) The call stack has `PredicatedIntrinsicGenerator` so it seems related to changes. I got replay file and will try to reproduce it tomorrow. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From duke at openjdk.java.net Thu Dec 16 08:09:04 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Thu, 16 Dec 2021 08:09:04 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 14:12:59 GMT, Doug Simon wrote: >> After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. >> >> Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. >> >> The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. >> >> Checked that tests are not affected. Checked on Aurora that performance is not affected. > > src/hotspot/share/runtime/deoptimization.cpp line 734: > >> 732: tty->print_cr("DEOPT UNPACKING thread " INTPTR_FORMAT " vframeArray " INTPTR_FORMAT " mode %d", >> 733: p2i(thread), p2i(array), exec_mode); >> 734: tty->cr(); > > Same question as above about the necessity of `cr()`. DEOPT PACKING thread 0x00007f9b9b008a20 Compiled frame (sp=0x0000700002121f00 unextended sp=0x0000700002121f00, fp=0x0000000000000000, real_fp=0x0000700002121f30, pc=0x000000011bad5944) nmethod 1714 154 4 java.lang.String::indexOf (29 bytes) Virtual frames (innermost first): 0 - (0x00007f9b9ff91038) - ifge @ bci 13 1 - (0x00007f9b9ff923a0) - invokestatic @ bci 13 Created vframeArray 0x00007f9ba280a820 DEOPT UNPACKING thread 0x00007f9b9b008a20 vframeArray 0x00007f9ba280a820 mode 2 {method} {0x000000012d00b8c8} 'indexOf' '(II)I' in 'java/lang/String' - invokestatic @ bci 13 sp = 0x0000700002121eb8 Expressions size: 0 Locals size: 3 [1. Interpreted Frame] [...] I have put it for separating the `DEOPT PACKING`, `DEOPT UNPACKING` and `SAVED OOP RESULT` blocks for readability reasons. But I don't mind removing it, if a compact log is preferred. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From stefank at openjdk.java.net Thu Dec 16 08:33:04 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 16 Dec 2021 08:33:04 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 03:46:10 GMT, Ioi Lam wrote: > Cause of crash: > > When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. > > Fix: > > Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. > > The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. > > Testing: > > I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. > > Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** I've reviewed the interaction of the klasses in the _dumptime_table with the new is_loader_alive() check. I don't know the reset of the CDS code to know if the other changes are correct or not. I spotted something that looks weird: src/hotspot/share/classfile/systemDictionaryShared.cpp line 194: > 192: _dump_in_progress = true; > 193: } > 194: Did you really intend to set _dump_in_progress to true in stop_dumping()? start_dumping() also sets it to true. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6859 From dnsimon at openjdk.java.net Thu Dec 16 08:38:02 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 16 Dec 2021 08:38:02 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 08:05:39 GMT, Tobias Holenstein wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 734: >> >>> 732: tty->print_cr("DEOPT UNPACKING thread " INTPTR_FORMAT " vframeArray " INTPTR_FORMAT " mode %d", >>> 733: p2i(thread), p2i(array), exec_mode); >>> 734: tty->cr(); >> >> Same question as above about the necessity of `cr()`. > > DEOPT PACKING thread 0x00007f9b9b008a20 Compiled frame (sp=0x0000700002121f00 unextended sp=0x0000700002121f00, fp=0x0000000000000000, real_fp=0x0000700002121f30, pc=0x000000011bad5944) > nmethod 1714 154 4 java.lang.String::indexOf (29 bytes) > Virtual frames (innermost first): > 0 - (0x00007f9b9ff91038) - ifge @ bci 13 > 1 - (0x00007f9b9ff923a0) - invokestatic @ bci 13 > Created vframeArray 0x00007f9ba280a820 > > DEOPT UNPACKING thread 0x00007f9b9b008a20 vframeArray 0x00007f9ba280a820 mode 2 > > {method} {0x000000012d00b8c8} 'indexOf' '(II)I' in 'java/lang/String' - invokestatic @ bci 13 sp = 0x0000700002121eb8 > Expressions size: 0 > Locals size: 3 > [1. Interpreted Frame] > [...] > > I have put it for separating the `DEOPT PACKING`, `DEOPT UNPACKING` and `SAVED OOP RESULT` blocks for readability reasons. But I don't mind removing it, if a compact log is preferred. I don't feel strongly about it but I think `-XX:+TraceDeoptimization -XX:-PrintDeoptimizationDetails` has never printed blank lines. However, this is primarily for human consumption so it's fine to leave as you have it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From dnsimon at openjdk.java.net Thu Dec 16 09:04:14 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 16 Dec 2021 09:04:14 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob Message-ID: This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: union { intptr_t _align; u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; } _trap_hist; To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: if (update_total_trap_count) { uint idx = reason; #if INCLUDE_JVMCI if (is_osr) { idx += Reason_TRAP_HISTORY_LENGTH; } #endif I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. ------------- Commit messages: - fix index used to access trap history in OSR compiled method Changes: https://git.openjdk.java.net/jdk/pull/6855/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6855&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278871 Stats: 17 lines in 5 files changed: 7 ins; 4 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6855/head:pull/6855 PR: https://git.openjdk.java.net/jdk/pull/6855 From mli at openjdk.java.net Thu Dec 16 09:37:34 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 16 Dec 2021 09:37:34 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure Message-ID: The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. The basic log related to evacuation failed will looks like below based on this patch. [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/6860/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6860&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278282 Stats: 22 lines in 6 files changed: 15 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6860.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6860/head:pull/6860 PR: https://git.openjdk.java.net/jdk/pull/6860 From mli at openjdk.java.net Thu Dec 16 10:13:03 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 16 Dec 2021 10:13:03 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: On Wed, 8 Dec 2021 11:30:45 GMT, Hamlin Li wrote: > This is to get information about the pause time distribution (prepare(copy, sorting, ?) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in ?Remove Self Forwards?, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future. > > > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... close this one, new pr is at #6860 ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From mli at openjdk.java.net Thu Dec 16 10:13:03 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 16 Dec 2021 10:13:03 GMT Subject: Withdrawn: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: <6deKNOgMWFoUB1lXOOTxmCaW2QXlbflozxwjm5oAUXc=.692a2a01-6532-4687-97cb-7bc6a1b7b8bc@github.com> On Wed, 8 Dec 2021 11:30:45 GMT, Hamlin Li wrote: > This is to get information about the pause time distribution (prepare(copy, sorting, ?) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in ?Remove Self Forwards?, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future. > > > [10.917s][debug][gc,phases] GC(0) Restore Retained Regions (ms): ... > [10.917s][debug][gc,phases] GC(0) Retained Regions: ... > [10.917s][trace][gc,phases] GC(0) Prepare Retained Object Refs (ms): ... > [10.917s][trace][gc,phases] GC(0) Reformat Retained Regions (ms): ... > [10.917s][trace][gc,phases] GC(0) Retained Objects: ... > [10.917s][trace][gc,phases] GC(0) Retained Bytes: ... > [10.917s][trace][gc,phases] GC(0) Reclaim Memory (ms): ... > [10.917s][trace][gc,phases] GC(0) Used [Native] Memory: ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6763 From phedlin at openjdk.java.net Thu Dec 16 11:37:02 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Thu, 16 Dec 2021 11:37:02 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > The motivation is found in the original x86 issue ([JDK-8274242](https://bugs.openjdk.java.net/browse/JDK-8274242)). > > Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. > > - Interleaved ISO and ASCII check code. > - Avoid 'umaxv' in the ISO main flow. > - Using post inc in main loop. > - Retain 8-char loop. > - Removing conditional prefetch (no upside). > - Adding ISO-8859-1 to encode-decode benchmark. > > Testing: tier1-6 > > The revised version compares like this (master vs. update). > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op So this will obviously be prolonged. Closing and moving to 19. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From phedlin at openjdk.java.net Thu Dec 16 11:37:03 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Thu, 16 Dec 2021 11:37:03 GMT Subject: [jdk18] Withdrawn: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 10:45:28 GMT, Patric Hedlin wrote: > Implementation of ISO/ASCII char set encoding, extending current implementation with ASCII encoding support. > > The motivation is found in the original x86 issue ([JDK-8274242](https://bugs.openjdk.java.net/browse/JDK-8274242)). > > Implementation with slight focus on balance between footprint and efficiency, trying to utilise a dual SIMD path (e.g. Neoverse N1) for the additional Ascii-check and avoid performance loss in the ISO-only case. > > - Interleaved ISO and ASCII check code. > - Avoid 'umaxv' in the ISO main flow. > - Using post inc in main loop. > - Retain 8-char loop. > - Removing conditional prefetch (no upside). > - Adding ISO-8859-1 to encode-decode benchmark. > > Testing: tier1-6 > > The revised version compares like this (master vs. update). > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 17.920 ? 0.229 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 18.867 ? 0.356 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 17.419 ? 0.220 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.200 ? 0.134 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 17.149 ? 0.219 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.115 ? 1.440 us/op > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 9.018 ? 0.179 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 10.550 ? 0.470 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 8.843 ? 0.187 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-1 avgt 30 6.406 ? 0.155 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 8.822 ? 0.173 us/op > CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 135.195 ? 1.432 us/op This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From aph at openjdk.java.net Thu Dec 16 13:18:59 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 16 Dec 2021 13:18:59 GMT Subject: [jdk18] RFR: 8274243: Implement fast-path for ASCII-compatible CharsetEncoders on aarch64 In-Reply-To: References: Message-ID: <9yDa1xxnZh3yLMjNhmxlf4lPduEbNKWVKDJdNQ3_-s4=.4fd7a715-8258-4605-9f6f-c7865db99571@github.com> On Thu, 16 Dec 2021 11:33:11 GMT, Patric Hedlin wrote: > So this will obviously be prolonged. Closing and moving to 19. That sounds right. Let's get it into mainline soon, and we can do a backport to 18.1. ------------- PR: https://git.openjdk.java.net/jdk18/pull/20 From coleenp at openjdk.java.net Thu Dec 16 13:40:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Dec 2021 13:40:02 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Setting misc_flags to false repeatedly isn't going to break anything wrt. concurrency. If you look at the CR, there are misc_flags that are not set during parsing. We could separate them to parsing ones and not parsing ones (two u1 flags) maybe. Seemed not worth the effort. I looked at making the setting functions for misc_flags not take a bool but then I'd have to do this in ClassFileParser, which seemed more verbose: ik->set_has_nonstatic_fields(_field_info->_has_nonstatic_fields); change to: if (_field_info->_has_nonstatic_fields) { set_has_nonstatic_fields(); } So worse, for not a very compelling reason. I'm actually not that happy with moving misc_flags (set has_resolved_methods) to access_flags in order to get atomic semantics. I want access flags to be things in the class file not control information (control information should be in the metadata instead). If I redid this change, that's what I'd do but it affects more than Klass access_flags. This change helps us not fall into the trap that caused the bug that I recently fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From kvn at openjdk.java.net Thu Dec 16 17:47:00 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 16 Dec 2021 17:47:00 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: <-e4Dc5GXZvVhkqxV1gLK2G589SGH9q4g8K2UHHRyutw=.63c4fafc-099c-42f8-b355-31dc055fe2bc@github.com> On Tue, 7 Dec 2021 14:46:05 GMT, Tobias Holenstein wrote: > After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. > > Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. > > The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. > > Checked that tests are not affected. Checked on Aurora that performance is not affected. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6746 From coleenp at openjdk.java.net Thu Dec 16 17:54:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Dec 2021 17:54:00 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes In-Reply-To: References: Message-ID: <6BsQfcGpBdOwhpBerFKrDF7vfpicS-ZB3XG32_mus74=.ca348cca-f2fe-43d7-8161-1d7b709f1c86@github.com> On Thu, 16 Dec 2021 03:46:10 GMT, Ioi Lam wrote: > Cause of crash: > > When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. > > Fix: > > Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. > > The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. > > Testing: > > I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. > > Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** Could you also add your unloads a lot test even though it doesn't reproduce this particular error without the ZGC change? It might find a similar bug under stress conditions. src/hotspot/share/cds/dumpTimeClassInfo.inline.hpp line 53: > 51: assert_lock_strong(DumpTimeTable_lock); > 52: if (k->is_loader_alive()) { > 53: assert(k->is_loader_alive(), "must be"); This does seem a bit paranoid and redundant here. src/hotspot/share/cds/dumpTimeClassInfo.inline.hpp line 58: > 56: return result; > 57: } else { > 58: if (!SystemDictionaryShared::is_excluded_class(k)) { I thought this was the original bug? is_excluded_class() looks at mirror->signers() which if the class isn't alive, mirror->signers() will crash. This has to be in the k->is_loader_alive() too. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6859 From never at openjdk.java.net Thu Dec 16 17:59:59 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 16 Dec 2021 17:59:59 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 14:46:05 GMT, Tobias Holenstein wrote: > After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. > > Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. > > The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. > > Checked that tests are not affected. Checked on Aurora that performance is not affected. Thanks for cleaning this up. I think a one or two things should be moved under PrintDeoptimizationDetails for consistency but I wanted to look more closely at the output before adding my comments. I'll have some comments up soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From kvn at openjdk.java.net Thu Dec 16 18:30:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 16 Dec 2021 18:30:56 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 23:11:58 GMT, Doug Simon wrote: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. Yes, I looked on history and unfortunately we not always updated `_trap_hist_limit`. It is not causing any big issues but we may lost some traps information. Looks good. I suggest to do regular testing too since change affects shared code. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6855 From never at openjdk.java.net Thu Dec 16 19:18:56 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 16 Dec 2021 19:18:56 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: <25NnHmVYKa3uaEB2PSkvbqlX8iwIFYItF2rDgcrC3ME=.56d9c8e8-c3b2-4f52-b8dd-ad8c88ddb4ba@github.com> On Wed, 15 Dec 2021 23:11:58 GMT, Doug Simon wrote: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6855 From dholmes at openjdk.java.net Thu Dec 16 21:54:59 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 16 Dec 2021 21:54:59 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: <8GQyRbN0TM0QQUFQ4c4qOlNRVyruDnVJ5LIw8V0MAZs=.126cca3c-16ae-42a0-9b4c-869ee0fdd23e@github.com> On Thu, 16 Dec 2021 13:37:19 GMT, Coleen Phillimore wrote: > Setting misc_flags to false repeatedly isn't going to break anything wrt. concurrency. Setting any of the bits concurrently is broken. If one piece of code is turning on bit N and another turning off bit M then the latter can cause the former to be lost. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From dholmes at openjdk.java.net Thu Dec 16 21:54:58 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 16 Dec 2021 21:54:58 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 20:06:41 GMT, Coleen Phillimore wrote: >> Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into misc > - Fix typo. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - Move has_resolved_methods to access flags so can be set and tested concurretly. > - 8277216: Examine InstanceKlass::_misc_flags for concurrency issues There is more potential for tightening things up around these flags, but the additional assertions are a step in the right direction even if they don't guard against all possible changes that could lead to breakage. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6851 From dlong at openjdk.java.net Thu Dec 16 23:01:14 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 16 Dec 2021 23:01:14 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 23:11:58 GMT, Doug Simon wrote: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. Suggest change: use ARRAY_SIZE() It seems a little fragile for asserts checking the array access to know the details of the array size computation (in case it changes in the future): 1999 assert((uint)reason < JVMCI_ONLY(2*) _trap_hist_limit, "oob"); How about using a constant/enum, or better yet, ARRAY_SIZE(_trap_hist._array) instead? ------------- Changes requested by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6855 From coleenp at openjdk.java.net Thu Dec 16 23:37:32 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Dec 2021 23:37:32 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: <8GQyRbN0TM0QQUFQ4c4qOlNRVyruDnVJ5LIw8V0MAZs=.126cca3c-16ae-42a0-9b4c-869ee0fdd23e@github.com> References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> <8GQyRbN0TM0QQUFQ4c4qOlNRVyruDnVJ5LIw8V0MAZs=.126cca3c-16ae-42a0-9b4c-869ee0fdd23e@github.com> Message-ID: <5K9N7kRTxjdj0_yXDpWl1SqR0FjILHGZsD0r0OvciZw=.588f686e-729e-4deb-ae0a-e7f28e8950fe@github.com> On Thu, 16 Dec 2021 21:49:33 GMT, David Holmes wrote: >> Setting misc_flags to false repeatedly isn't going to break anything wrt. concurrency. >Setting any of the bits concurrently is broken. If one piece of code is turning on bit N and another turning off bit M then the >latter can cause the former to be lost. Yes, that's why I changed the code to do nothing if passed false. Thanks for the code review, David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From iklam at openjdk.java.net Thu Dec 16 23:54:29 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 16 Dec 2021 23:54:29 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes In-Reply-To: <6BsQfcGpBdOwhpBerFKrDF7vfpicS-ZB3XG32_mus74=.ca348cca-f2fe-43d7-8161-1d7b709f1c86@github.com> References: <6BsQfcGpBdOwhpBerFKrDF7vfpicS-ZB3XG32_mus74=.ca348cca-f2fe-43d7-8161-1d7b709f1c86@github.com> Message-ID: On Thu, 16 Dec 2021 17:50:21 GMT, Coleen Phillimore wrote: > Could you also add your unloads a lot test even though it doesn't reproduce this particular error without the ZGC change? It might find a similar bug under stress conditions. OK, I'll add the test case. > src/hotspot/share/cds/dumpTimeClassInfo.inline.hpp line 53: > >> 51: assert_lock_strong(DumpTimeTable_lock); >> 52: if (k->is_loader_alive()) { >> 53: assert(k->is_loader_alive(), "must be"); > > This does seem a bit paranoid and redundant here. Oops, that's was left over code. I'll remove it. > src/hotspot/share/cds/dumpTimeClassInfo.inline.hpp line 58: > >> 56: return result; >> 57: } else { >> 58: if (!SystemDictionaryShared::is_excluded_class(k)) { > > I thought this was the original bug? is_excluded_class() looks at mirror->signers() which if the class isn't alive, mirror->signers() will crash. This has to be in the k->is_loader_alive() too. is_excluded_class() only checks the DumpTimeClassInfo::_is_excluded field. It doesn't examine the mirror->signers(). The crash happened with SystemDictionaryShared::check_excluded_classes(), which does examine the signers. bool SystemDictionaryShared::is_excluded_class(InstanceKlass* k) { assert(_no_class_loading_should_happen, "sanity"); assert_lock_strong(DumpTimeTable_lock); Arguments::assert_is_dumping_archive(); DumpTimeClassInfo* p = find_or_allocate_info_for_locked(k); return (p == NULL) ? true : p->is_excluded(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From iklam at openjdk.java.net Thu Dec 16 23:54:30 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 16 Dec 2021 23:54:30 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 08:25:44 GMT, Stefan Karlsson wrote: >> Cause of crash: >> >> When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. >> >> Fix: >> >> Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. >> >> The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. >> >> Testing: >> >> I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. >> >> Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** > > src/hotspot/share/classfile/systemDictionaryShared.cpp line 194: > >> 192: _dump_in_progress = true; >> 193: } >> 194: > > Did you really intend to set _dump_in_progress to true in stop_dumping()? start_dumping() also sets it to true. Oh, it should be `_dump_in_progress = false;`. I'll fix that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From dlong at openjdk.java.net Thu Dec 16 23:58:25 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 16 Dec 2021 23:58:25 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Vladimir, the assert sounds like JDK-??6868269. Is there an earlier null check that also needs the reexecute flag? ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From dholmes at openjdk.java.net Fri Dec 17 00:09:25 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 17 Dec 2021 00:09:25 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: <5K9N7kRTxjdj0_yXDpWl1SqR0FjILHGZsD0r0OvciZw=.588f686e-729e-4deb-ae0a-e7f28e8950fe@github.com> References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> <8GQyRbN0TM0QQUFQ4c4qOlNRVyruDnVJ5LIw8V0MAZs=.126cca3c-16ae-42a0-9b4c-869ee0fdd23e@github.com> <5K9N7kRTxjdj0_yXDpWl1SqR0FjILHGZsD0r0OvciZw=.588f686e-729e-4deb-ae0a-e7f28e8950fe@github.com> Message-ID: On Thu, 16 Dec 2021 23:34:54 GMT, Coleen Phillimore wrote: > Yes, that's why I changed the code to do nothing if passed false. Sorry Coleen, that completely escaped my notice. :( ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From jwilhelm at openjdk.java.net Fri Dec 17 00:15:11 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 17 Dec 2021 00:15:11 GMT Subject: RFR: Merge jdk18 Message-ID: Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8278574: update --help-extra message to include default value of --finalization option - 8278389: SuspendibleThreadSet::_suspend_all should be volatile/atomic - 8278575: update jcmd GC.finalizer_info to list finalization status The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6873&range=00.0 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6873&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/6873/files Stats: 29 lines in 4 files changed: 12 ins; 1 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/6873.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6873/head:pull/6873 PR: https://git.openjdk.java.net/jdk/pull/6873 From iklam at openjdk.java.net Fri Dec 17 00:28:11 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 17 Dec 2021 00:28:11 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v2] In-Reply-To: References: Message-ID: > Cause of crash: > > When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. > > Fix: > > Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. > > The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. > > Testing: > > I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. > > Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - added test case - @coleenp and @stefank review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6859/files - new: https://git.openjdk.java.net/jdk/pull/6859/files/0075715d..686284fd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=00-01 Stats: 173 lines in 4 files changed: 171 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6859/head:pull/6859 PR: https://git.openjdk.java.net/jdk/pull/6859 From jwilhelm at openjdk.java.net Fri Dec 17 01:11:29 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 17 Dec 2021 01:11:29 GMT Subject: Integrated: Merge jdk18 In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 00:05:41 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: 634afe8c Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/634afe8c5c0855eafb1639f54ecc8e9c9e568814 Stats: 29 lines in 4 files changed: 12 ins; 1 del; 16 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6873 From svkamath at openjdk.java.net Fri Dec 17 01:14:26 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 17 Dec 2021 01:14:26 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 07:24:11 GMT, Vladimir Kozlov wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Unfortunately I hit assert during CTW testing on Windows-x64 when compiling `com.sun.crypto.provider.GaloisCounterMode::implGCMCrypt`. > > > # Internal Error (t:\workspace\open\src\hotspot\share\opto\graphKit.cpp:250), pid=41320, tid=43308 > # assert(ex_map->jvms()->same_calls_as(_exceptions->jvms())) failed: all collected exceptions must come from the same place > > Current CompileTask: > C2: 7834 2950 b 4 com.sun.crypto.provider.GaloisCounterMode::implGCMCrypt (100 bytes) > > Stack: [0x000000c86a600000,0x000000c86a700000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xbb90d1] os::platform_print_native_stack+0xf1 (os_windows_x86.cpp:235) > V [jvm.dll+0xdf904e] VMError::report+0x101e (vmError.cpp:828) > V [jvm.dll+0xdfaa4e] VMError::report_and_die+0x7fe (vmError.cpp:1656) > V [jvm.dll+0xdfb1d4] VMError::report_and_die+0x64 (vmError.cpp:1437) > V [jvm.dll+0x536f47] report_vm_error+0xb7 (debug.cpp:280) > V [jvm.dll+0x6da439] GraphKit::add_exception_states_from+0x119 (graphKit.cpp:286) > V [jvm.dll+0x414375] PredicatedIntrinsicGenerator::generate+0x7f5 (callGenerator.cpp:1358) > V [jvm.dll+0x5c3b62] Parse::do_call+0x9c2 (doCall.cpp:651) > V [jvm.dll+0xbe3745] Parse::do_one_bytecode+0x32b5 (parse2.cpp:2704) > V [jvm.dll+0xbd5ae7] Parse::do_one_block+0x437 (parse1.cpp:1557) > V [jvm.dll+0xbd463c] Parse::do_all_blocks+0x5cc (parse1.cpp:710) > V [jvm.dll+0xbd0c3d] Parse::Parse+0xc1d (parse1.cpp:616) > V [jvm.dll+0x413a15] ParseGenerator::generate+0xa5 (callGenerator.cpp:103) > V [jvm.dll+0x4eba90] Compile::Compile+0x1110 (compile.cpp:714) > > > The call stack has `PredicatedIntrinsicGenerator` so it seems related to changes. > > I got replay file and will try to reproduce it tomorrow. @vnkozlov Since, array allocation on heap is causing issues, would it be better if I update the code to have the temp array allocation on stack? The updated code will eliminate the need for re-execution. I have the change ready and can push it if you think it is alright to do so. Please do let me know your thoughts. Thanks. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From mli at openjdk.java.net Fri Dec 17 01:19:23 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 17 Dec 2021 01:19:23 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015. > > > > ========= test result (1st round) ========== > rcu base > 45096 38980 > 41741 41468 > 42349 41053 > 44485 42030 > 47103 39915 > 43864 36004 > > ==== average ==== > 44106.33333 39908.33333 > > ==== improvement ==== > 10.5% > > ========= test result (2nd round) ========== > Second round of run includes 3 types: > 1. pad gc data & pad rcu > 2. pad rcu only > 3. base > > Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement. > > gc data & rcu rcu base > 41284 41860 37099 > 42296 42166 44692 > 42810 43423 41801 > 43492 45603 40274 > 43808 40641 39627 > 43029 40242 39793 > 42543 41662 41544 > 43420 42702 37991 > 44212 43354 40319 > 42692 43442 45264 > 44773 44577 44213 > 40835 41870 42008 > 44282 44167 42527 > > ==== average ==== > 43036.61538 42746.84615 41319.38462 > > ==== improvement ==== > gc data + rcu / base: 4.156% > rcu / base: 3.45% > > > > > ========= configuration and environment ========== > specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > > SPEC_OPTS_C="-Dspecjbb.group.count=$GROUP_COUNT -Dspecjbb.txi.pergroup.count=$TI_JVM_COUNT" > SPEC_OPTS_TI="" > SPEC_OPTS_BE="" > > JAVA_OPTS_C="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_TI="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > > MODE_ARGS_C="-ikv" > MODE_ARGS_TI="-ikv" > MODE_ARGS_BE="-ikv" > > NUM_OF_RUNS=1 > > HW: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 224 > On-line CPU(s) list: 0-223 > Thread(s) per core: 2 > Core(s) per socket: 28 > Socket(s): 4 > NUMA node(s): 4 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz > Stepping: 4 > CPU MHz: 1001.925 > CPU max MHz: 2101.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 4200.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 39424K > NUMA node0 CPU(s): 0-27,112-139 > NUMA node1 CPU(s): 28-55,140-167 > NUMA node2 CPU(s): 56-83,168-195 > NUMA node3 CPU(s): 84-111,196-223 > > total used free shared buff/cache available > Mem: 3.0T 3.8G 2.9T 18M 25G 2.9T > Swap: 99G 0B 99G need further investigation. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From kvn at openjdk.java.net Fri Dec 17 03:17:23 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 17 Dec 2021 03:17:23 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 23:54:50 GMT, Dean Long wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Vladimir, the assert sounds like JDK-??6868269. Is there an earlier null check that also needs the reexecute flag? @dean-long, there is null check in predicate code: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/callGenerator.cpp#L1268 This is first time we use `PreserveReexecuteState` for an intrinsic with predicate. But I am actually not sure if the failure I see is caused by this changes or existing bug. But it causes concern. @smita-kamath yes, please prepare new changes with stack allocation and I will test it. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From kvn at openjdk.java.net Fri Dec 17 03:17:23 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 17 Dec 2021 03:17:23 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Note, we can't change code in `PredicatedIntrinsicGenerator` because it is used by all intrinsics with predicate and non do re-execution. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From dlong at openjdk.java.net Fri Dec 17 04:41:24 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 17 Dec 2021 04:41:24 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Based on past bugs, it looks like to use new_array here, we need to pass /*deoptimize_on_exception=*/true as the optional 5th argument, just like inline_native_clone does. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From iklam at openjdk.java.net Fri Dec 17 07:27:45 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 17 Dec 2021 07:27:45 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v3] In-Reply-To: References: Message-ID: > Cause of crash: > > When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. > > Fix: > > Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. > > The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. > > Testing: > > I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. > > Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug - added test case - @coleenp and @stefank review comments - cleaned up code - add #if INCLUDE_CDS - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug - using k->is_loader_alive() is enough - Added DumpTimeSharedClassTable::iterate() to make sure every iteration goes through EligibleClassIterationHelper - step1 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6859/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=02 Stats: 274 lines in 8 files changed: 266 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6859/head:pull/6859 PR: https://git.openjdk.java.net/jdk/pull/6859 From rehn at openjdk.java.net Fri Dec 17 08:40:55 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Fri, 17 Dec 2021 08:40:55 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() Message-ID: Please consider this enhancement. This makes Thread.currentThread() eight times faster on my box when running in interpreter. Passes t1-t4 As suggested I added a related fix to Shenandoah. Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. Other GC seems to always use the base version, so let's use the base version in Shenandoah also. No issues found when locally running gc/shenandoah. ------------- Commit messages: - We must use the MacroAssembler version of the call VM when not having a proper intepreter frame - Intrinsic Changes: https://git.openjdk.java.net/jdk/pull/6833/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6833&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278793 Stats: 35 lines in 6 files changed: 31 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6833.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6833/head:pull/6833 PR: https://git.openjdk.java.net/jdk/pull/6833 From rehn at openjdk.java.net Fri Dec 17 08:40:55 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Fri, 17 Dec 2021 08:40:55 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:08:54 GMT, Robbin Ehn wrote: > Please consider this enhancement. > This makes Thread.currentThread() eight times faster on my box when running in interpreter. > > Passes t1-t4 > > As suggested I added a related fix to Shenandoah. > Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). > The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. > Other GC seems to always use the base version, so let's use the base version in Shenandoah also. > No issues found when locally running gc/shenandoah. @rkennke can you please review this, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From rkennke at openjdk.java.net Fri Dec 17 09:57:25 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 17 Dec 2021 09:57:25 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:08:54 GMT, Robbin Ehn wrote: > Please consider this enhancement. > This makes Thread.currentThread() eight times faster on my box when running in interpreter. > > Passes t1-t4 > > As suggested I added a related fix to Shenandoah. > Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). > The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. > Other GC seems to always use the base version, so let's use the base version in Shenandoah also. > No issues found when locally running gc/shenandoah. Hi Robbin, Looks mostly good, one small comment. src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 456: > 454: // Only IN_HEAP loads require a thread_tmp register > 455: __ access_load_at(T_OBJECT, IN_NATIVE, rax, > 456: Address(rscratch2, 0), rscratch1, noreg); I don't like the hardcoded 0 here. You should use MacroAssembler::resolve_oop_handle() instead. (That has the 0 too, I believe this should be changed to a constant like OopHandle::obj_offset_in_bytes() or so, but that is another story.) ------------- Changes requested by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6833 From duke at openjdk.java.net Fri Dec 17 11:44:40 2021 From: duke at openjdk.java.net (Bhavana-Kilambi) Date: Fri, 17 Dec 2021 11:44:40 GMT Subject: RFR: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed [v2] In-Reply-To: References: Message-ID: > The product variable "PrefetchFieldsAhead" is defined in gc_globals.hpp and set in vm_version_x86.cpp. > But as it's not used anywhere, removing this option from the JDK source. Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed ------------- Changes: https://git.openjdk.java.net/jdk/pull/6783/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6783&range=01 Stats: 11 lines in 3 files changed: 1 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6783.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6783/head:pull/6783 PR: https://git.openjdk.java.net/jdk/pull/6783 From rehn at openjdk.java.net Fri Dec 17 12:02:49 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Fri, 17 Dec 2021 12:02:49 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> > Please consider this enhancement. > This makes Thread.currentThread() eight times faster on my box when running in interpreter. > > Passes t1-t4 > > As suggested I added a related fix to Shenandoah. > Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). > The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. > Other GC seems to always use the base version, so let's use the base version in Shenandoah also. > No issues found when locally running gc/shenandoah. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use resolve_oop_handle instead ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6833/files - new: https://git.openjdk.java.net/jdk/pull/6833/files/e0d699c3..976baf2b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6833&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6833&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6833.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6833/head:pull/6833 PR: https://git.openjdk.java.net/jdk/pull/6833 From rehn at openjdk.java.net Fri Dec 17 12:02:51 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Fri, 17 Dec 2021 12:02:51 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 09:54:28 GMT, Roman Kennke wrote: > Hi Robbin, Looks mostly good, one small comment. Thanks for having a look! > src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 456: > >> 454: // Only IN_HEAP loads require a thread_tmp register >> 455: __ access_load_at(T_OBJECT, IN_NATIVE, rax, >> 456: Address(rscratch2, 0), rscratch1, noreg); > > I don't like the hardcoded 0 here. You should use MacroAssembler::resolve_oop_handle() instead. (That has the 0 too, I believe this should be changed to a constant like OopHandle::obj_offset_in_bytes() or so, but that is another story.) Yes, thanks, agreed, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From duke at openjdk.java.net Fri Dec 17 12:03:37 2021 From: duke at openjdk.java.net (Bhavana-Kilambi) Date: Fri, 17 Dec 2021 12:03:37 GMT Subject: RFR: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed [v3] In-Reply-To: References: Message-ID: > The product variable "PrefetchFieldsAhead" is defined in gc_globals.hpp and set in vm_version_x86.cpp. > But as it's not used anywhere, removing this option from the JDK source. Bhavana-Kilambi has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6783/files - new: https://git.openjdk.java.net/jdk/pull/6783/files/ec81c677..bd81befb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6783&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6783&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6783.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6783/head:pull/6783 PR: https://git.openjdk.java.net/jdk/pull/6783 From tschatzl at openjdk.java.net Fri Dec 17 12:07:20 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 17 Dec 2021 12:07:20 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 09:29:09 GMT, Hamlin Li wrote: > The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. > The basic log related to evacuation failed will looks like below based on this patch. > > > [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 > [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 Lgtm apart from the omission of that log message check. I assume that it's just not worth adding log messages for the sub phases (sorting, ...) of the evacuation failure handling since we decided on the other option using the prev bitmap already. That's fine. In any case we can always improve these messages. test/hotspot/jtreg/gc/g1/TestGCLogMessages.java line 268: > 266: new LogMessageWithLevel("Recalculate Used Memory", Level.DEBUG), > 267: new LogMessageWithLevel("Restore Preserved Marks", Level.DEBUG), > 268: new LogMessageWithLevel("Restore Retained Regions", Level.DEBUG), Please also check the work item in this test. Needs to be added just here. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6860 From dnsimon at openjdk.java.net Fri Dec 17 12:14:25 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 17 Dec 2021 12:14:25 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 22:57:57 GMT, Dean Long wrote: > Suggest change: use ARRAY_SIZE() Thanks, I was not aware of the `ARRAY_SIZE` macro. I'll also add this macro when backporting this issue to 11. ------------- PR: https://git.openjdk.java.net/jdk/pull/6855 From dnsimon at openjdk.java.net Fri Dec 17 12:53:06 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 17 Dec 2021 12:53:06 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v2] In-Reply-To: References: Message-ID: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: use ARRAY_SIZE macro and add comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6855/files - new: https://git.openjdk.java.net/jdk/pull/6855/files/d7495903..c07e1778 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6855&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6855&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6855/head:pull/6855 PR: https://git.openjdk.java.net/jdk/pull/6855 From duke at openjdk.java.net Fri Dec 17 13:37:26 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Fri, 17 Dec 2021 13:37:26 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 17:56:37 GMT, Tom Rodriguez wrote: > Thanks for cleaning this up. I think a one or two things should be moved under PrintDeoptimizationDetails for consistency but I wanted to look more closely at the output before adding my comments. I'll have some comments up soon. Sure I will wait for your comment. (It could make sense to move some parts to `PrintDeoptimizationDetails`) Note: `PrintDeoptimizationDetails` is not a diagnostic flag at the moment. I didn't change that in this PR because JDK-8278329 is typed as a Bug. I think if that is desired, it's best to change it in a separate enhancement issue. Also `PrintDeoptimizationDetails` can be combined with `Verbose` and `WizardMode` which are also not diagnostic flags. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From coleenp at openjdk.java.net Fri Dec 17 13:45:29 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Dec 2021 13:45:29 GMT Subject: RFR: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues [v3] In-Reply-To: References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> <8GQyRbN0TM0QQUFQ4c4qOlNRVyruDnVJ5LIw8V0MAZs=.126cca3c-16ae-42a0-9b4c-869ee0fdd23e@github.com> <5K9N7kRTxjdj0_yXDpWl1SqR0FjILHGZsD0r0OvciZw=.588f686e-729e-4deb-ae0a-e7f28e8950fe@github.com> Message-ID: On Fri, 17 Dec 2021 00:06:25 GMT, David Holmes wrote: > Sorry Coleen, that completely escaped my notice. :( Rarely does that happen. Thanks for the code review Harold and David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From coleenp at openjdk.java.net Fri Dec 17 13:45:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Dec 2021 13:45:30 GMT Subject: Integrated: 8277216: Examine InstanceKlass::_misc_flags for concurrency issues In-Reply-To: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> References: <1ji1h1EZixh1ub15-xKS9R6QFaANvRuOdHWQWpfY0FE=.052c63f5-10eb-4289-999f-a80d05c79de7@github.com> Message-ID: On Wed, 15 Dec 2021 16:14:32 GMT, Coleen Phillimore wrote: > Recent bug in misc_flags showed that they are not set concurrently and could cause bugs. Most of the misc_flags are set at classfile parsing time or at a safepoint and never reset. This change adds an assert that the flag is set once. See CR for more details. > Tested with tier1-3. This pull request has now been integrated. Changeset: 3607a5cd Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/3607a5cdd9a3013851c8faefd346f04934f897e6 Stats: 32 lines in 2 files changed: 10 ins; 17 del; 5 mod 8277216: Examine InstanceKlass::_misc_flags for concurrency issues Reviewed-by: hseigel, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6851 From dnsimon at openjdk.java.net Fri Dec 17 14:00:21 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 17 Dec 2021 14:00:21 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v2] In-Reply-To: References: Message-ID: On Thu, 16 Dec 2021 18:27:42 GMT, Vladimir Kozlov wrote: > Looks good. I suggest to do regular testing too since change affects shared code. I've run `hs-tier1,hs-tier2,hs-tier3` testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6855 From rkennke at openjdk.java.net Fri Dec 17 14:41:26 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 17 Dec 2021 14:41:26 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Fri, 17 Dec 2021 12:02:49 GMT, Robbin Ehn wrote: >> Please consider this enhancement. >> This makes Thread.currentThread() eight times faster on my box when running in interpreter. >> >> Passes t1-t4 >> >> As suggested I added a related fix to Shenandoah. >> Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). >> The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. >> Other GC seems to always use the base version, so let's use the base version in Shenandoah also. >> No issues found when locally running gc/shenandoah. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use resolve_oop_handle instead Looks good, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6833 From ccheung at openjdk.java.net Fri Dec 17 15:59:28 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 17 Dec 2021 15:59:28 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v3] In-Reply-To: References: Message-ID: <8K9NnEPUVe3frVBGDI40WtbV9tOUl3SUD1K6OXoaepI=.7170ba6f-ad1b-42d9-a247-fb209c03fe0c@github.com> On Fri, 17 Dec 2021 07:27:45 GMT, Ioi Lam wrote: >> Cause of crash: >> >> When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. >> >> Fix: >> >> Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. >> >> The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. >> >> Testing: >> >> I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. >> >> Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug > - added test case > - @coleenp and @stefank review comments > - cleaned up code > - add #if INCLUDE_CDS > - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug > - using k->is_loader_alive() is enough > - Added DumpTimeSharedClassTable::iterate() to make sure every iteration goes through EligibleClassIterationHelper > - step1 Looks good. Just couple of questions on the test. test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/test-classes/LotsUnloadApp.java line 81: > 79: static String x; > 80: static double d = 123; > 81: static float f = 456; Are the above declarations needed? They are not being used. test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/test-classes/LotsUnloadApp.java line 88: > 86: public void doit(Runnable r) { > 87: r.run(); > 88: } I don't see the above method being called. ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From coleenp at openjdk.java.net Fri Dec 17 16:26:22 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Dec 2021 16:26:22 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v3] In-Reply-To: References: <6BsQfcGpBdOwhpBerFKrDF7vfpicS-ZB3XG32_mus74=.ca348cca-f2fe-43d7-8161-1d7b709f1c86@github.com> Message-ID: On Thu, 16 Dec 2021 23:50:57 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/dumpTimeClassInfo.inline.hpp line 58: >> >>> 56: return result; >>> 57: } else { >>> 58: if (!SystemDictionaryShared::is_excluded_class(k)) { >> >> I thought this was the original bug? is_excluded_class() looks at mirror->signers() which if the class isn't alive, mirror->signers() will crash. This has to be in the k->is_loader_alive() too. > > is_excluded_class() only checks the DumpTimeClassInfo::_is_excluded field. It doesn't examine the mirror->signers(). The crash happened with SystemDictionaryShared::check_excluded_classes(), which does examine the signers. > > > bool SystemDictionaryShared::is_excluded_class(InstanceKlass* k) { > assert(_no_class_loading_should_happen, "sanity"); > assert_lock_strong(DumpTimeTable_lock); > Arguments::assert_is_dumping_archive(); > DumpTimeClassInfo* p = find_or_allocate_info_for_locked(k); > return (p == NULL) ? true : p->is_excluded(); > } Ok, sorry I got the names confused. ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From coleenp at openjdk.java.net Fri Dec 17 16:26:22 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Dec 2021 16:26:22 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v3] In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 07:27:45 GMT, Ioi Lam wrote: >> Cause of crash: >> >> When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. >> >> Fix: >> >> Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. >> >> The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. >> >> Testing: >> >> I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. >> >> Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug > - added test case > - @coleenp and @stefank review comments > - cleaned up code > - add #if INCLUDE_CDS > - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug > - using k->is_loader_alive() is enough > - Added DumpTimeSharedClassTable::iterate() to make sure every iteration goes through EligibleClassIterationHelper > - step1 Looks good. Thank you for adding the test. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6859 From svkamath at openjdk.java.net Fri Dec 17 18:10:00 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 17 Dec 2021 18:10:00 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v2] In-Reply-To: References: Message-ID: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Fix to allocate 48 additonal htbl entries in the stub. ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/19/files - new: https://git.openjdk.java.net/jdk18/pull/19/files/c9c73d1f..f727ba16 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=00-01 Stats: 58 lines in 11 files changed: 3 ins; 45 del; 10 mod Patch: https://git.openjdk.java.net/jdk18/pull/19.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/19/head:pull/19 PR: https://git.openjdk.java.net/jdk18/pull/19 From iklam at openjdk.java.net Fri Dec 17 18:23:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 17 Dec 2021 18:23:07 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v3] In-Reply-To: <8K9NnEPUVe3frVBGDI40WtbV9tOUl3SUD1K6OXoaepI=.7170ba6f-ad1b-42d9-a247-fb209c03fe0c@github.com> References: <8K9NnEPUVe3frVBGDI40WtbV9tOUl3SUD1K6OXoaepI=.7170ba6f-ad1b-42d9-a247-fb209c03fe0c@github.com> Message-ID: <5_kOaBjm3XyqEjiy23Wr8-RVHqxvi7TEHHEF81RIjLY=.0769def0-e2f5-40a1-8e97-7253c790e813@github.com> On Fri, 17 Dec 2021 15:55:12 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug >> - added test case >> - @coleenp and @stefank review comments >> - cleaned up code >> - add #if INCLUDE_CDS >> - Merge branch 'master' into 8278602-cds-zgc-class-unload-bug >> - using k->is_loader_alive() is enough >> - Added DumpTimeSharedClassTable::iterate() to make sure every iteration goes through EligibleClassIterationHelper >> - step1 > > test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/test-classes/LotsUnloadApp.java line 81: > >> 79: static String x; >> 80: static double d = 123; >> 81: static float f = 456; > > Are the above declarations needed? They are not being used. These are left overs. I've remove them. > test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/test-classes/LotsUnloadApp.java line 88: > >> 86: public void doit(Runnable r) { >> 87: r.run(); >> 88: } > > I don't see the above method being called. I've removed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From iklam at openjdk.java.net Fri Dec 17 18:23:01 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 17 Dec 2021 18:23:01 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v4] In-Reply-To: References: Message-ID: > Cause of crash: > > When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. > > Fix: > > Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. > > The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. > > Testing: > > I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. > > Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @calvinccheung comments -- removed unused code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6859/files - new: https://git.openjdk.java.net/jdk/pull/6859/files/8584389e..ea1f318b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6859&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6859/head:pull/6859 PR: https://git.openjdk.java.net/jdk/pull/6859 From ccheung at openjdk.java.net Fri Dec 17 18:26:25 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 17 Dec 2021 18:26:25 GMT Subject: RFR: 8278602: CDS dynamic dump may access unloaded classes [v4] In-Reply-To: References: Message-ID: <5kLpotz3_j0ke1Bt2TiJ7CgDtgHuO8ajOyNDbG4rGZM=.96e932c9-9240-417e-9dd5-8e86dc3c83c9@github.com> On Fri, 17 Dec 2021 18:23:01 GMT, Ioi Lam wrote: >> Cause of crash: >> >> When dumping a CDS archive, while iterating over entries of the `SystemDictionaryShared::_dumptime_table`, we do not check whether the classes are already unloaded. In the crash, we are trying to call `InstanceKlass::signer()` but the class has already been unloaded. >> >> Fix: >> >> Override the template function `DumpTimeSharedClassTable::iterate` to ensure iteration safety. Do not iterate over a class if its `class_loader_data` is no longer alive. >> >> The assert in `DumpTimeSharedClassTable::IterationHelper` found another existing bug -- we were calling `SystemDictionaryShared::is_dumptime_table_empty()` without holding the `DumpTimeTable_lock`. I delayed the call until we have grabbed the lock. >> >> Testing: >> >> I have attached a test case into the bug report. Without the fix, it would reproduce the same crash in less than a minute. With the fix, the crash is no longer reproducible. >> >> Unfortunately, the test case requires a ZGC patch (thanks to @stefank) that adds delays to increase the likelihood of seeing unloaded classes inside the `_dumptime_table`. Therefore, I cannot integrate the test as a jtreg test. I'll mark the bug as **noreg-hard** > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @calvinccheung comments -- removed unused code Marked as reviewed by ccheung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6859 From kvn at openjdk.java.net Fri Dec 17 19:16:25 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 17 Dec 2021 19:16:25 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v2] In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 12:53:06 GMT, Doug Simon wrote: >> This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. >> >> JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: >> >> union { >> intptr_t _align; >> u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; >> } _trap_hist; >> >> >> To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: >> >> >> if (update_total_trap_count) { >> uint idx = reason; >> #if INCLUDE_JVMCI >> if (is_osr) { >> idx += Reason_TRAP_HISTORY_LENGTH; >> } >> #endif >> >> >> I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use ARRAY_SIZE macro and add comments Update looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6855 From kvn at openjdk.java.net Fri Dec 17 19:30:32 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 17 Dec 2021 19:30:32 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v2] In-Reply-To: References: Message-ID: <-hoeuB7TNw0hDUF1Uy2YyYGPEIpfZ7VckcXDUiWed10=.d6c9d532-a1c6-454a-9918-39c743a3d82d@github.com> On Fri, 17 Dec 2021 18:10:00 GMT, Smita Kamath wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Fix to allocate 48 additonal htbl entries in the stub. I have few comments. I will start testing. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4444: > 4442: __ movptr(state, state_mem); > 4443: #endif > 4444: __ subptr(rsp, 96 * longSize); // Create space on the stack for htbl entries Is this aligned correctly? Or alignment does not matter? src/hotspot/share/opto/library_call.cpp line 6766: > 6764: Node* ghash_object = argument(8); > 6765: > 6766: // (1) in, ct and out are arrays. You need to restore indent. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From never at openjdk.java.net Fri Dec 17 20:54:26 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 17 Dec 2021 20:54:26 GMT Subject: RFR: JDK-8278329: some TraceDeoptimization code not included in PRODUCT build In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 14:46:05 GMT, Tobias Holenstein wrote: > After "JDK-8154011: Make `TraceDeoptimization` a diagnostic flag" some code was not included in the PRODUCT build. > > Removed all the #ifndef PRODUCT guards around `TraceDeoptimization` checks and made sure to be consistent. > > The DEOPT PACKING messages were controlled by `PrintDeoptimizationDetails` (develop flag), but DEOPT UNPACKING is controlled by `TraceDeoptimization` (product flag),. Therefore changed DEOPT PACKING messages to be controlled by `TraceDeoptimization` as well. > > Checked that tests are not affected. Checked on Aurora that performance is not affected. We can convert JDK-8278329 into a RFE if we need to. Creating yet another bug just complicates things. For debugging output the distinction between a bug and an RFE is pretty small anyway. The first thing I notice that in a release build we get `[CodeBlob]` in this output which isn't very helpful. DEOPT PACKING thread 0x00007fae8000dc00 Compiled frame (sp=0x000070000dd06ee0 unextended sp=0x000070000dd06ee0, fp=0x000000000000003a, real_fp=0x000070000dd06f10, pc=0x0000000116950388) [CodeBlob] Virtual frames (innermost first): In fastdebug we get output like: nmethod 2351 1042 4 jdk.internal.misc.Unsafe::allocateUninitializedArray (55 bytes) so I think the code is using a print function that doesn't exist in product. That said I don't think that line of output is helpful since it reiterates the information in the trap or packing messages, so I'd be inclined to delete it. A full pack/unpack sequence looks like this: Uncommon trap bci=2 pc=0x00000001169ba620, relative_pc=0x00000000000005c0, method=scala.collection.mutable.HashTable$class.elemEquals(Lscala/collection/mutable/HashTable;Ljava/lang/Object;Ljava/lang/Object;)Z, debug_id=0 Uncommon trap occurred in scala.collection.mutable.HashTable$class::findEntry compiler=c2 compile_id=1418 (@0x00000001169ba620) thread=7171 reason=unstable_if action=reinterpret unloaded_class_index=-1 debug_id=0 DEOPT PACKING thread 0x00007fae8000dc00 Compiled frame (sp=0x000070000dd06230 unextended sp=0x000070000dd06230, fp=0x00000006016a06e0, real_fp=0x000070000dd06280, pc=0x00000001169ba620) [CodeBlob] Virtual frames (innermost first): 0 - (0x00007fae90293010) - if_acmpne @ bci 2 1 - (0x00007fae90294328) - invokestatic @ bci 3 2 - (0x00007fae90295640) - invokeinterface @ bci 35 Created vframeArray 0x00007fadf1041800 DEOPT UNPACKING thread 0x00007fae8000dc00 vframeArray 0x00007fadf1041800 mode 2 {method} {0x0000000133ac41a8} 'findEntry' '(Lscala/collection/mutable/HashTable;Ljava/lang/Object;)Lscala/collection/mutable/HashEntry;' in 'scala/collection/mutable/HashTable$class' - invokeinterface @ bci 35 sp = 0x000070000dd061f0 {method} {0x0000000133824bf8} 'elemEquals' '(Ljava/lang/Object;Ljava/lang/Object;)Z' in 'scala/collection/mutable/HashMap' - invokestatic @ bci 3 sp = 0x000070000dd06180 {method} {0x0000000133ac4a60} 'elemEquals' '(Lscala/collection/mutable/HashTable;Ljava/lang/Object;Ljava/lang/Object;)Z' in 'scala/collection/mutable/HashTable$class' - if_acmpne @ bci 2 sp = 0x000070000dd06118 The `{method}` lines correspond to the vframes in the `PACKING` step so it would be nice if they were printed in a similar way, without the extra blank line in between. We should also use a different printing function so they are printed in a more natural way, like class.name(parameters) without the '{method}` part. So I'd recommended moving the `DEOPT UNPACKING` printing into `vframeArray::unpack_to_stack and try to make the output look similar between the two. The unpacking step just add information about the sp used in the recreated interpreter frame. Maybe something like this: DEOPT UNPACKING thread 0x00007fae8000dc00 vframeArray 0x00007fadf1041800 mode 2 Virtual frames (innermost first): 0 - {0x0000000133ac41a8} scala/collection/mutable/HashTable$class.findEntry(Lscala/collection/mutable/HashTable;Ljava/lang/Object;)Lscala/collection/mutable/HashEntry; - invokeinterface @ bci 35 sp = 1 - {0x0000000133824bf8} scala/collection/mutable/HashMap.elemEquals(Ljava/lang/Object;Ljava/lang/Object;)Z - invokestatic @ bci 3 sp = 2 - {0x0000000133ac4a60} scala/collection/mutable/HashTable$class.elemEquals(Lscala/collection/mutable/HashTable;Ljava/lang/Object;Ljava/lang/Object;)Z - if_acmpne @ bci 2 sp = ``` and update the vframe printing to include similar information about the actual method? There's also the issue of 2 `Uncommon trap` messages for every trap that show slightly different information. A single message would be clearer but maybe there's some good reason for the double printing that I'm missing. I can prepare a changeset with my suggestions if it's unclear what I'm asking for. I'm fine with the current state of PrintDeoptimizationDetails being non-product, but I'm surprised no one has finally deleted the `Verbose` and `WizardMode` flags. Those are some ancient artifacts that should probably be purged. ------------- PR: https://git.openjdk.java.net/jdk/pull/6746 From dlong at openjdk.java.net Fri Dec 17 21:52:33 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 17 Dec 2021 21:52:33 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v2] In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 12:53:06 GMT, Doug Simon wrote: >> This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. >> >> JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: >> >> union { >> intptr_t _align; >> u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; >> } _trap_hist; >> >> >> To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: >> >> >> if (update_total_trap_count) { >> uint idx = reason; >> #if INCLUDE_JVMCI >> if (is_osr) { >> idx += Reason_TRAP_HISTORY_LENGTH; >> } >> #endif >> >> >> I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use ARRAY_SIZE macro and add comments Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6855 From svkamath at openjdk.java.net Sat Dec 18 03:23:56 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Sat, 18 Dec 2021 03:23:56 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v3] In-Reply-To: References: Message-ID: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Added alignment to stack allocation, resolved indentation issue ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/19/files - new: https://git.openjdk.java.net/jdk18/pull/19/files/f727ba16..12b16ac9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=01-02 Stats: 82 lines in 2 files changed: 19 ins; 13 del; 50 mod Patch: https://git.openjdk.java.net/jdk18/pull/19.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/19/head:pull/19 PR: https://git.openjdk.java.net/jdk18/pull/19 From svkamath at openjdk.java.net Sat Dec 18 03:23:59 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Sat, 18 Dec 2021 03:23:59 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v2] In-Reply-To: <-hoeuB7TNw0hDUF1Uy2YyYGPEIpfZ7VckcXDUiWed10=.d6c9d532-a1c6-454a-9918-39c743a3d82d@github.com> References: <-hoeuB7TNw0hDUF1Uy2YyYGPEIpfZ7VckcXDUiWed10=.d6c9d532-a1c6-454a-9918-39c743a3d82d@github.com> Message-ID: <-0MVFIxx7NtWWWNL4dxIhEPS-bmZAbNps7DIJyfFBv4=.cc08892b-6200-4d86-a205-e407445e3919@github.com> On Fri, 17 Dec 2021 19:24:22 GMT, Vladimir Kozlov wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix to allocate 48 additonal htbl entries in the stub. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4444: > >> 4442: __ movptr(state, state_mem); >> 4443: #endif >> 4444: __ subptr(rsp, 96 * longSize); // Create space on the stack for htbl entries > > Is this aligned correctly? Or alignment does not matter? Thanks, Vladimir. I've addressed both your comments in the latest update. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From kvn at openjdk.java.net Sat Dec 18 05:37:25 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 18 Dec 2021 05:37:25 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v3] In-Reply-To: References: Message-ID: <8_Y6CvYYStYChu9yHY9wKwXHiIao6ZbZ2gdZiYAnaDA=.f7238bd8-e556-4515-b7cf-9c141197eea7@github.com> On Sat, 18 Dec 2021 03:23:56 GMT, Smita Kamath wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added alignment to stack allocation, resolved indentation issue src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4456: > 4454: __ aesgcm_encrypt(in, len, ct, out, key, state, subkeyHtbl, avx512_subkeyHtbl, counter); > 4455: > 4456: __ addptr(rsp, 96 * longSize); I don't think you need this instruction since you restore `RSP` in the next. Otherwise looks good. Testing passed fine. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From dnsimon at openjdk.java.net Sat Dec 18 06:53:01 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sat, 18 Dec 2021 06:53:01 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v3] In-Reply-To: References: Message-ID: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed spelling in comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6855/files - new: https://git.openjdk.java.net/jdk/pull/6855/files/c07e1778..b1561afa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6855&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6855&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6855/head:pull/6855 PR: https://git.openjdk.java.net/jdk/pull/6855 From dnsimon at openjdk.java.net Sat Dec 18 06:53:04 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sat, 18 Dec 2021 06:53:04 GMT Subject: RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob [v3] In-Reply-To: <25NnHmVYKa3uaEB2PSkvbqlX8iwIFYItF2rDgcrC3ME=.56d9c8e8-c3b2-4f52-b8dd-ad8c88ddb4ba@github.com> References: <25NnHmVYKa3uaEB2PSkvbqlX8iwIFYItF2rDgcrC3ME=.56d9c8e8-c3b2-4f52-b8dd-ad8c88ddb4ba@github.com> Message-ID: On Thu, 16 Dec 2021 19:15:48 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed spelling in comment > > Marked as reviewed by never (Reviewer). Thanks for the reviews @tkrodriguez @dean-long @vnkozlov . ------------- PR: https://git.openjdk.java.net/jdk/pull/6855 From dnsimon at openjdk.java.net Sat Dec 18 06:53:06 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Sat, 18 Dec 2021 06:53:06 GMT Subject: Integrated: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 23:11:58 GMT, Doug Simon wrote: > This PR fixes a discrepancy between `MethodData::_trap_hist_list` and `Deoptimization::Reason_LIMIT` in terms of JVMCI specific usage. > > JVMCI doubles the size of the deopt history for a method so that it can distinguish deopts in a normal method compilation from deopts in an OSR compilation: > > union { > intptr_t _align; > u1 _array[JVMCI_ONLY(2 *) MethodData::_trap_hist_limit]; > } _trap_hist; > > > To access the history for OSR deopts, the index for a deopt reason needs to be adjusted to the upper half of the history array. The amount used for the adjustment was incorrect and this PR fixes it: > > > if (update_total_trap_count) { > uint idx = reason; > #if INCLUDE_JVMCI > if (is_osr) { > idx += Reason_TRAP_HISTORY_LENGTH; > } > #endif > > > I introduced `Reason_TRAP_HISTORY_LENGTH` as a replacement for `25 JVMCI_ONLY(+5), // decoupled from Deoptimization::Reason_LIMIT` as this decoupling is unnecessary (as dangerous) as far as I can see. This pull request has now been integrated. Changeset: 6f0e8da6 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/6f0e8da6d3bef340299e48977d5e17d05eabe682 Stats: 21 lines in 5 files changed: 9 ins; 4 del; 8 mod 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob Reviewed-by: kvn, never, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/6855 From mli at openjdk.java.net Mon Dec 20 02:25:24 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 20 Dec 2021 02:25:24 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure In-Reply-To: References: Message-ID: <-TbqedIhD6KRbgkS1OOnTqUVeZ5vRTLTMDxQuetYrVo=.4ea31a7d-5c70-4177-b1cb-1620762c3e1c@github.com> On Fri, 17 Dec 2021 12:03:00 GMT, Thomas Schatzl wrote: >> The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. >> The basic log related to evacuation failed will looks like below based on this patch. >> >> >> [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 >> [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 > > test/hotspot/jtreg/gc/g1/TestGCLogMessages.java line 268: > >> 266: new LogMessageWithLevel("Recalculate Used Memory", Level.DEBUG), >> 267: new LogMessageWithLevel("Restore Preserved Marks", Level.DEBUG), >> 268: new LogMessageWithLevel("Restore Retained Regions", Level.DEBUG), > > Please also check the work item in this test. Needs to be added just here. Thanks Thomas, I have added checking newly added work items. ------------- PR: https://git.openjdk.java.net/jdk/pull/6860 From mli at openjdk.java.net Mon Dec 20 02:38:57 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 20 Dec 2021 02:38:57 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure [v2] In-Reply-To: References: Message-ID: > The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. > The basic log related to evacuation failed will looks like below based on this patch. > > > [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 > [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6860/files - new: https://git.openjdk.java.net/jdk/pull/6860/files/e70dc665..78cb8ffd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6860&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6860&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6860.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6860/head:pull/6860 PR: https://git.openjdk.java.net/jdk/pull/6860 From shade at openjdk.java.net Mon Dec 20 11:29:58 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Dec 2021 11:29:58 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v3] In-Reply-To: References: Message-ID: <877OBDhOaXQDtM-0PcvoLsRlrsWSX50DLoA5-fzbvpM=.7e3c025b-8aae-41e2-b589-7ac26a42d4cc@github.com> > Our support engineers asked this: > >> I see these G1Concurrent safepoints in JDK17: >> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching > safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >> I've always thought that "concurrent" and "safepoint" are basically antonyms. >> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? > > I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: > > > [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms > [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns > [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms > [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns > [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Use override - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop - Review Thomas - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop - Whitespace and touchups - Basic implementation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6677/files - new: https://git.openjdk.java.net/jdk/pull/6677/files/06479f45..e26df883 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6677&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6677&range=01-02 Stats: 30248 lines in 906 files changed: 20806 ins; 5250 del; 4192 mod Patch: https://git.openjdk.java.net/jdk/pull/6677.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6677/head:pull/6677 PR: https://git.openjdk.java.net/jdk/pull/6677 From shade at openjdk.java.net Mon Dec 20 11:29:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Dec 2021 11:29:59 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v2] In-Reply-To: References: Message-ID: <-yXYo2cqUO7Q_S3F1cIWatR5---2r2zUOewISRv-PM0=.0b058b81-4ac1-45a9-848c-ddc8040946f4@github.com> On Fri, 10 Dec 2021 00:38:33 GMT, Kim Barrett wrote: > Looks good. Regarding suggestions to use `override`, I don't need a re-review if you make those changes. Thanks, added `override`. It looks to me that the Hotspot style is to carry `virtual` only on the superclass declaration, and do `override` on all subclass overrides, dropping explicit `virtual` for them. True? See new commit that does that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6677 From shade at openjdk.java.net Mon Dec 20 11:59:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Dec 2021 11:59:05 GMT Subject: RFR: 8277893: Arraycopy stress tests [v5] In-Reply-To: References: Message-ID: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 4m6.192s > user 52m50.523s > sys 0m13.755s > > # x86_64 (TR 3970X) -XX:+UseZGC > real 6m2.573s > user 72m43.541s > sys 0m25.697s > > # x86_32 (TR 3970X) > real 6m56.405s > user 92m56.377s > sys 0m6.677s > > # x86_64 (i5-11500) > real 29m19.024s > user 103m52.925s > sys 1m7.175s > > # AArch64 (ThunderX2) > real 2m59.623s > user 26m14.624s > sys 0m9.771s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into JDK-8277893-arraycopy-tests - Bump timeout to 7200 - Merge branch 'master' into JDK-8277893-arraycopy-tests - Package declarations - Add safety check for small systems - Renames - Single driver for all the tests - Safer timeout settings - Post-merge TEST.groups cleanup - Merge branch 'master' into JDK-8277893-arraycopy-tests - ... and 4 more: https://git.openjdk.java.net/jdk/compare/07d279a8...6789eb8b ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6594/files - new: https://git.openjdk.java.net/jdk/pull/6594/files/b749c367..6789eb8b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=03-04 Stats: 20094 lines in 535 files changed: 14663 ins; 3580 del; 1851 mod Patch: https://git.openjdk.java.net/jdk/pull/6594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594 PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Mon Dec 20 11:59:14 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Dec 2021 11:59:14 GMT Subject: RFR: 8277893: Arraycopy stress tests [v4] In-Reply-To: References: Message-ID: On Thu, 9 Dec 2021 07:12:47 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Bump timeout to 7200 > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Package declarations > - Add safety check for small systems > - Renames > - Single driver for all the tests > - Safer timeout settings > - Post-merge TEST.groups cleanup > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - ... and 3 more: https://git.openjdk.java.net/jdk/compare/fbdbb5b1...b749c367 Rebased to current master. Tests still pass. I think I need a second (R)eviewer to push this, please. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From svkamath at openjdk.java.net Mon Dec 20 18:40:20 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Mon, 20 Dec 2021 18:40:20 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v4] In-Reply-To: References: Message-ID: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Removed __addptr instruction as it was not needed ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/19/files - new: https://git.openjdk.java.net/jdk18/pull/19/files/12b16ac9..0140f672 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=19&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk18/pull/19.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/19/head:pull/19 PR: https://git.openjdk.java.net/jdk18/pull/19 From sviswanathan at openjdk.java.net Mon Dec 20 18:40:22 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 20 Dec 2021 18:40:22 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v4] In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 18:36:46 GMT, Smita Kamath wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed __addptr instruction as it was not needed Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/19 From kvn at openjdk.java.net Mon Dec 20 18:43:20 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 20 Dec 2021 18:43:20 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 [v4] In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 18:40:20 GMT, Smita Kamath wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Removed __addptr instruction as it was not needed I tested without `addptr` instructions and it passed. It is good for integration. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/19 From svkamath at openjdk.java.net Mon Dec 20 18:48:09 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Mon, 20 Dec 2021 18:48:09 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 03:14:07 GMT, Vladimir Kozlov wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Note, we can't change code in `PredicatedIntrinsicGenerator` because it is used by all intrinsics with predicate and non do re-execution. @vnkozlov Thanks a lot for approving this PR. I appreciate your help. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From duke at openjdk.java.net Mon Dec 20 19:49:09 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Mon, 20 Dec 2021 19:49:09 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() [v2] In-Reply-To: References: Message-ID: > Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Add JMH micro benchmark to measure performance ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6857/files - new: https://git.openjdk.java.net/jdk/pull/6857/files/60d976f3..4567eab8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6857&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6857&range=00-01 Stats: 87 lines in 1 file changed: 87 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6857.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6857/head:pull/6857 PR: https://git.openjdk.java.net/jdk/pull/6857 From duke at openjdk.java.net Mon Dec 20 19:49:10 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Mon, 20 Dec 2021 19:49:10 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() In-Reply-To: References: Message-ID: On Wed, 15 Dec 2021 23:51:19 GMT, Vamsi Parasa wrote: > Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. ------------- PR: https://git.openjdk.java.net/jdk/pull/6857 From sviswanathan at openjdk.java.net Mon Dec 20 20:12:42 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 20 Dec 2021 20:12:42 GMT Subject: [jdk18] RFR: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Fri, 17 Dec 2021 03:14:07 GMT, Vladimir Kozlov wrote: >> The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. > > Note, we can't change code in `PredicatedIntrinsicGenerator` because it is used by all intrinsics with predicate and non do re-execution. @vnkozlov @dean-long Thanks a lot for guiding Smita through this fix. ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From svkamath at openjdk.java.net Mon Dec 20 20:12:43 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Mon, 20 Dec 2021 20:12:43 GMT Subject: [jdk18] Integrated: 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 06:16:23 GMT, Smita Kamath wrote: > The failure happens with XX:+DeoptimizeAlot option. I've set reexecute bit and reset the appropriate state for the interpreter to execute the code when deoptimization occurs. This pull request has now been integrated. Changeset: 819f9bd0 Author: Smita Kamath Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk18/commit/819f9bd084fa49222a4310fbcf4933005e9f0ca4 Stats: 63 lines in 12 files changed: 11 ins; 43 del; 9 mod 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.java.net/jdk18/pull/19 From dcubed at openjdk.java.net Mon Dec 20 23:19:51 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 20 Dec 2021 23:19:51 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Fri, 17 Dec 2021 12:02:49 GMT, Robbin Ehn wrote: >> Please consider this enhancement. >> This makes Thread.currentThread() eight times faster on my box when running in interpreter. >> >> Passes t1-t4 >> >> As suggested I added a related fix to Shenandoah. >> Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). >> The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. >> Other GC seems to always use the base version, so let's use the base version in Shenandoah also. >> No issues found when locally running gc/shenandoah. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use resolve_oop_handle instead Thumbs up. Looks good to me. I don't quite understand the Shenandoah changes, but I think Roman has that covered. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6833 From svkamath at openjdk.java.net Tue Dec 21 01:00:42 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Dec 2021 01:00:42 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction Message-ID: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Adding vzeroupper instruction to aes and shift intrinsics. ------------- Commit messages: - Adding vzeroupper instruction to aes and shift intrinsics Changes: https://git.openjdk.java.net/jdk18/pull/52/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=52&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279045 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk18/pull/52.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/52/head:pull/52 PR: https://git.openjdk.java.net/jdk18/pull/52 From jbhateja at openjdk.java.net Tue Dec 21 05:18:16 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 21 Dec 2021 05:18:16 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() [v2] In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 19:49:09 GMT, Vamsi Parasa wrote: >> Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Add JMH micro benchmark to measure performance src/hotspot/share/opto/superword.cpp line 2951: > 2949: if (VectorNode::is_vpopcntq(use)) { > 2950: // VPOPCNTQ takes longs and produces ints - hence the special checks > 2951: // on alignment and size. Use IR node reference instead of target specific instruction. src/hotspot/share/opto/vectornode.hpp line 517: > 515: }; > 516: > 517: //------------------------------SqrtVFNode-------------------------------------- I think we can remove "I" specialization from existing PopCountVI and make the IR node generic. It already has a type which should be sufficient to emit type specific instruction. There are many vector node which are common across types. test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java line 65: > 63: } > 64: > 65: public void vectorizeBitCount() { We can add check based on new IR framework here. ------------- PR: https://git.openjdk.java.net/jdk/pull/6857 From dholmes at openjdk.java.net Tue Dec 21 05:46:14 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Dec 2021 05:46:14 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Fri, 17 Dec 2021 12:02:49 GMT, Robbin Ehn wrote: >> Please consider this enhancement. >> This makes Thread.currentThread() eight times faster on my box when running in interpreter. >> >> Passes t1-t4 >> >> As suggested I added a related fix to Shenandoah. >> Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). >> The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. >> Other GC seems to always use the base version, so let's use the base version in Shenandoah also. >> No issues found when locally running gc/shenandoah. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use resolve_oop_handle instead Ditto what Dan said :) Historically we haven't worried about currentThread() interpreter performance because the method is so hot during VM init that it should be quickly compiled anyway. Perhaps that has changed over time and we didn't notice. It would be interesting to see if there is any observable startup benefit here. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6833 From duke at openjdk.java.net Tue Dec 21 05:48:16 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 21 Dec 2021 05:48:16 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() [v2] In-Reply-To: References: Message-ID: On Tue, 21 Dec 2021 05:06:26 GMT, Jatin Bhateja wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add JMH micro benchmark to measure performance > > src/hotspot/share/opto/superword.cpp line 2951: > >> 2949: if (VectorNode::is_vpopcntq(use)) { >> 2950: // VPOPCNTQ takes longs and produces ints - hence the special checks >> 2951: // on alignment and size. > > Use IR node reference instead of target specific instruction. Thanks Jatin for noticing that! Will rename the functions to have generic names instead of target specific names. Will also modify the comment to be generic... ------------- PR: https://git.openjdk.java.net/jdk/pull/6857 From jbhateja at openjdk.java.net Tue Dec 21 06:50:15 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 21 Dec 2021 06:50:15 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() [v2] In-Reply-To: References: Message-ID: <5IK_9IPsVuhhBDHnJWZ6xa58QpTf0ydnLH7nlo97imw=.6b51dff4-cf21-4f91-b6f1-38175404a74a@github.com> On Tue, 21 Dec 2021 04:53:29 GMT, Jatin Bhateja wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add JMH micro benchmark to measure performance > > src/hotspot/share/opto/vectornode.hpp line 517: > >> 515: }; >> 516: >> 517: //------------------------------SqrtVFNode-------------------------------------- > > I think we can remove "I" specialization from existing PopCountVI and make the IR node generic. It already has a type which should be sufficient to emit type specific instruction. There are many vector node which are common across types. Since its already supported by AARCH64/PPC so we can keep your existing changes as is. ------------- PR: https://git.openjdk.java.net/jdk/pull/6857 From mli at openjdk.java.net Tue Dec 21 07:35:26 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 21 Dec 2021 07:35:26 GMT Subject: RFR: 8277893: Arraycopy stress tests [v5] In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 11:59:05 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Bump timeout to 7200 > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - Package declarations > - Add safety check for small systems > - Renames > - Single driver for all the tests > - Safer timeout settings > - Post-merge TEST.groups cleanup > - Merge branch 'master' into JDK-8277893-arraycopy-tests > - ... and 4 more: https://git.openjdk.java.net/jdk/compare/88e52bed...6789eb8b Lgtm. Just some minor comment in test code. test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 125: > 123: > 124: testWith(size, l, r, len); > 125: testWith(size, r, l, len); The checks and testWith invocations of disjoint and conjoint are almost same except of "Not disjoint" and "Not conjoint" assert, it could be consolidated. ------------- Marked as reviewed by mli (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6594 From rkennke at openjdk.java.net Tue Dec 21 08:47:21 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 21 Dec 2021 08:47:21 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Mon, 20 Dec 2021 23:14:49 GMT, Daniel D. Daugherty wrote: > Thumbs up. Looks good to me. I don't quite understand the Shenandoah changes, but I think Roman has that covered. That's because the intrinsic doesn't have an interpreter frame. The normal call_VM_leaf() checks the sp register. We can bypass this by using super_call_VM_leaf() instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From shade at openjdk.java.net Tue Dec 21 09:15:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Dec 2021 09:15:46 GMT Subject: RFR: 8277893: Arraycopy stress tests [v6] In-Reply-To: References: Message-ID: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 4m6.192s > user 52m50.523s > sys 0m13.755s > > # x86_64 (TR 3970X) -XX:+UseZGC > real 6m2.573s > user 72m43.541s > sys 0m25.697s > > # x86_32 (TR 3970X) > real 6m56.405s > user 92m56.377s > sys 0m6.677s > > # x86_64 (i5-11500) > real 29m19.024s > user 103m52.925s > sys 1m7.175s > > # AArch64 (ThunderX2) > real 2m59.623s > user 26m14.624s > sys 0m9.771s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Peel out check{Bounds,Conjoint,Disjoint} ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6594/files - new: https://git.openjdk.java.net/jdk/pull/6594/files/6789eb8b..bb823dbd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=04-05 Stats: 31 lines in 1 file changed: 20 ins; 6 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594 PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Dec 21 09:15:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Dec 2021 09:15:54 GMT Subject: RFR: 8277893: Arraycopy stress tests [v5] In-Reply-To: References: Message-ID: On Tue, 21 Dec 2021 07:31:26 GMT, Hamlin Li wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8277893-arraycopy-tests >> - Bump timeout to 7200 >> - Merge branch 'master' into JDK-8277893-arraycopy-tests >> - Package declarations >> - Add safety check for small systems >> - Renames >> - Single driver for all the tests >> - Safer timeout settings >> - Post-merge TEST.groups cleanup >> - Merge branch 'master' into JDK-8277893-arraycopy-tests >> - ... and 4 more: https://git.openjdk.java.net/jdk/compare/2b18a990...6789eb8b > > test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 125: > >> 123: >> 124: testWith(size, l, r, len); >> 125: testWith(size, r, l, len); > > The checks and testWith invocations of disjoint and conjoint are almost same except of "Not disjoint" and "Not conjoint" assert, it could be consolidated. Yes, good suggestion. See new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From mli at openjdk.java.net Tue Dec 21 09:54:17 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 21 Dec 2021 09:54:17 GMT Subject: RFR: 8277893: Arraycopy stress tests [v6] In-Reply-To: References: Message-ID: <7_iDLPnePED0yvcfLbN0-Q17YxYuLsloQvQeygo9yo0=.288dd122-064d-4318-a437-0d69e6912a58@github.com> On Tue, 21 Dec 2021 09:15:46 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Peel out check{Bounds,Conjoint,Disjoint} Thanks for updating, looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Dec 21 14:05:24 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Dec 2021 14:05:24 GMT Subject: RFR: 8277893: Arraycopy stress tests [v6] In-Reply-To: References: Message-ID: On Tue, 21 Dec 2021 09:15:46 GMT, Aleksey Shipilev wrote: >> I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. >> >> A brief tour of these tests: >> >> - Tests all data types; >> - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; >> - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; >> - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; >> - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; >> >> My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. >> >> Test times: >> >> >> # x86_64 (TR 3970X) >> real 4m6.192s >> user 52m50.523s >> sys 0m13.755s >> >> # x86_64 (TR 3970X) -XX:+UseZGC >> real 6m2.573s >> user 72m43.541s >> sys 0m25.697s >> >> # x86_32 (TR 3970X) >> real 6m56.405s >> user 92m56.377s >> sys 0m6.677s >> >> # x86_64 (i5-11500) >> real 29m19.024s >> user 103m52.925s >> sys 1m7.175s >> >> # AArch64 (ThunderX2) >> real 2m59.623s >> user 26m14.624s >> sys 0m9.771s >> >> >> Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` >> - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` >> - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Peel out check{Bounds,Conjoint,Disjoint} GHA are clean for the latest revision. This is test-only PR for JDK 19, so any future problems with it can be resolved next year. Meanwhile, we can get more testing over the end-of-the-year break. I am integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Dec 21 14:05:24 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Dec 2021 14:05:24 GMT Subject: Integrated: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev wrote: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 4m6.192s > user 52m50.523s > sys 0m13.755s > > # x86_64 (TR 3970X) -XX:+UseZGC > real 6m2.573s > user 72m43.541s > sys 0m25.697s > > # x86_32 (TR 3970X) > real 6m56.405s > user 92m56.377s > sys 0m6.677s > > # x86_64 (i5-11500) > real 29m19.024s > user 103m52.925s > sys 1m7.175s > > # AArch64 (ThunderX2) > real 2m59.623s > user 26m14.624s > sys 0m9.771s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` This pull request has now been integrated. Changeset: 29bd7363 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/29bd73638a22d341767a1266723a7d7263e17093 Stats: 1154 lines in 12 files changed: 1153 ins; 0 del; 1 mod 8277893: Arraycopy stress tests Reviewed-by: kvn, mli ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From neliasso at openjdk.java.net Tue Dec 21 14:32:25 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 21 Dec 2021 14:32:25 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 00:51:51 GMT, Smita Kamath wrote: > Adding vzeroupper instruction to aes and shift intrinsics. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/52 From tschatzl at openjdk.java.net Tue Dec 21 15:16:18 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 21 Dec 2021 15:16:18 GMT Subject: RFR: 8278146: G1: Rework VM_G1Concurrent VMOp to clearly identify it as pause [v3] In-Reply-To: <877OBDhOaXQDtM-0PcvoLsRlrsWSX50DLoA5-fzbvpM=.7e3c025b-8aae-41e2-b589-7ac26a42d4cc@github.com> References: <877OBDhOaXQDtM-0PcvoLsRlrsWSX50DLoA5-fzbvpM=.7e3c025b-8aae-41e2-b589-7ac26a42d4cc@github.com> Message-ID: On Mon, 20 Dec 2021 11:29:58 GMT, Aleksey Shipilev wrote: >> Our support engineers asked this: >> >>> I see these G1Concurrent safepoints in JDK17: >>> [0.064s][info][safepoint ] Safepoint "G1Concurrent", Time since last: 1666947 ns, Reaching >> safepoint: 79150 ns, At safepoint: 349999 ns, Total: 429149 ns >>> I've always thought that "concurrent" and "safepoint" are basically antonyms. >>> What is a G1Concurrent safepoint? How can a concurrent event require a safepoint? >> >> I agree that's confusing. This patch splits the VM_G1Concurrent into two exactly named VMOp-s, so that we get: >> >> >> [6.527s][info][gc ] GC(7) Pause Remark 64M->64M(224M) 218.847ms >> [6.527s][info][safepoint] Safepoint "G1PauseRemark", Time since last: 17493991 ns, Reaching safepoint: 506830 ns, At safepoint: 218950374 ns, Total: 219457204 ns >> [6.536s][info][gc ] GC(7) Pause Cleanup 71M->71M(224M) 0.177ms >> [6.536s][info][safepoint] Safepoint "G1PauseCleanup", Time since last: 8250157 ns, Reaching safepoint: 884967 ns, At safepoint: 223964 ns, Total: 1108931 ns >> [6.537s][info][gc ] GC(7) Concurrent Mark Cycle 247.051ms >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Use override > - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop > - Review Thomas > - Merge branch 'master' into JDK-8278146-g1-concurrent-vmop > - Whitespace and touchups > - Basic implementation Still good. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6677 From rehn at openjdk.java.net Tue Dec 21 16:08:17 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 21 Dec 2021 16:08:17 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Tue, 21 Dec 2021 08:43:52 GMT, Roman Kennke wrote: > Thumbs up. Looks good to me. I don't quite understand the Shenandoah changes, but I think Roman has that covered. Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From rehn at openjdk.java.net Tue Dec 21 16:12:18 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 21 Dec 2021 16:12:18 GMT Subject: RFR: 8278793: Interpreter(x64) intrinsify Thread.currentThread() [v2] In-Reply-To: References: <7rTs9aXIVR59TahwoJbmn9gU-JYzRS8XlySAUqNujgQ=.baeed9fa-5608-477f-9e5b-d3af7044000b@github.com> Message-ID: On Tue, 21 Dec 2021 05:43:16 GMT, David Holmes wrote: > Ditto what Dan said :) > > Historically we haven't worried about currentThread() interpreter performance because the method is so hot during VM init that it should be quickly compiled anyway. Perhaps that has changed over time and we didn't notice. It would be interesting to see if there is any observable startup benefit here. > > Thanks, David Thanks We call currentThread() around 600 times in interpreter during startup. But is not noticeable, even if we are theoretically be 20us faster. ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From rehn at openjdk.java.net Tue Dec 21 16:17:24 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 21 Dec 2021 16:17:24 GMT Subject: Integrated: 8278793: Interpreter(x64) intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Tue, 14 Dec 2021 15:08:54 GMT, Robbin Ehn wrote: > Please consider this enhancement. > This makes Thread.currentThread() eight times faster on my box when running in interpreter. > > Passes t1-t4 > > As suggested I added a related fix to Shenandoah. > Shenandoah LB was using InterpreterMacroAssembler version of call_VM_leaf_base (it's virtual). > The interpreter version adds a check on last_sp, since the intrinsic is not setting up a new frame, this check is faulty. > Other GC seems to always use the base version, so let's use the base version in Shenandoah also. > No issues found when locally running gc/shenandoah. This pull request has now been integrated. Changeset: f7309060 Author: Robbin Ehn URL: https://git.openjdk.java.net/jdk/commit/f7309060ded0edb1e614663572f876d83b77c28e Stats: 33 lines in 6 files changed: 29 ins; 0 del; 4 mod 8278793: Interpreter(x64) intrinsify Thread.currentThread() Reviewed-by: rkennke, dcubed, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6833 From coleenp at openjdk.java.net Tue Dec 21 17:06:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Dec 2021 17:06:24 GMT Subject: Withdrawn: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 22:59:39 GMT, Coleen Phillimore wrote: > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From coleenp at openjdk.java.net Tue Dec 21 17:06:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Dec 2021 17:06:24 GMT Subject: RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: <0dVeN-K7oWhiFEQEkJnH-jL1mk7xhg33YU_2YBtAlWE=.28779fbd-c92b-4dac-b2c4-b7efb516a2e8@github.com> On Mon, 20 Dec 2021 22:59:39 GMT, Coleen Phillimore wrote: > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. I should have made this change to the JDK 18 repository. Closing this one. ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From sviswanathan at openjdk.java.net Tue Dec 21 17:39:25 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 21 Dec 2021 17:39:25 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 00:51:51 GMT, Smita Kamath wrote: > Adding vzeroupper instruction to aes and shift intrinsics. The patch looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/52 From svkamath at openjdk.java.net Tue Dec 21 18:44:20 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Dec 2021 18:44:20 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 00:51:51 GMT, Smita Kamath wrote: > Adding vzeroupper instruction to aes and shift intrinsics. @vnkozlov Could this code change be integrated? Please do advice. If it needs testing, would it be possible for you to help me out? Thank you. ------------- PR: https://git.openjdk.java.net/jdk18/pull/52 From kvn at openjdk.java.net Tue Dec 21 19:51:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 21 Dec 2021 19:51:16 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 00:51:51 GMT, Smita Kamath wrote: > Adding vzeroupper instruction to aes and shift intrinsics. Please, update to latest changes in JDK18 (there was changes pushed to stubs #19). I started testing. ------------- PR: https://git.openjdk.java.net/jdk18/pull/52 From svkamath at openjdk.java.net Tue Dec 21 19:57:56 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Dec 2021 19:57:56 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction [v2] In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: > Adding vzeroupper instruction to aes and shift intrinsics. Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' - Adding vzeroupper instruction to aes and shift intrinsics ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/52/files - new: https://git.openjdk.java.net/jdk18/pull/52/files/f8021390..1b131eec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=52&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=52&range=00-01 Stats: 1689 lines in 115 files changed: 1120 ins; 135 del; 434 mod Patch: https://git.openjdk.java.net/jdk18/pull/52.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/52/head:pull/52 PR: https://git.openjdk.java.net/jdk18/pull/52 From jwilhelm at openjdk.java.net Tue Dec 21 19:59:51 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 21 Dec 2021 19:59:51 GMT Subject: RFR: Merge jdk18 Message-ID: <2qAXnV7AsRJ0Z5Of9gkLOHIACVFGRfrl3MVhJsfpSew=.b02d2da2-3ce5-497d-a979-4e602c3834be@github.com> Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8278627: Shenandoah: TestHeapDump test failed - 8279074: ProblemList compiler/codecache/jmx/PoolsIndependenceTest.java on macosx-aarch64 - 8278044: ObjectInputStream methods invoking the OIF.CFG.getSerialFilterFactory() silent about error cases. - 8278087: Deserialization filter and filter factory property error reporting under specified - 8279011: JFR: JfrChunkWriter incorrectly handles int64_t chunk size as size_t - 8274323: compiler/codegen/aes/TestAESMain.java failed with "Error: invalid offset: -1434443640" after 8273297 - 8278609: [macos] accessibility frame is misplaced on a secondary monitor on macOS - 8278413: C2 crash when allocating array of size too large - 8278970: [macos] SigningPackageTest is failed with runtime exception - ... and 9 more: https://git.openjdk.java.net/jdk/compare/f31dead6...d5d5bad9 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6911&range=00.0 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6911&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/6911/files Stats: 1086 lines in 80 files changed: 695 ins; 107 del; 284 mod Patch: https://git.openjdk.java.net/jdk/pull/6911.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6911/head:pull/6911 PR: https://git.openjdk.java.net/jdk/pull/6911 From kvn at openjdk.java.net Tue Dec 21 21:17:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 21 Dec 2021 21:17:16 GMT Subject: [jdk18] RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: <-5p48Ttu4Th7IdX7X2HqEdQYm1iFRxkyXQFB4Hnswvw=.ae121f39-801c-4d03-8759-739ccf3e5b10@github.com> On Tue, 21 Dec 2021 20:51:04 GMT, Coleen Phillimore wrote: > This is the fix for https://github.com/openjdk/jdk/pull/6900 retargeted to JDK 18. > > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. > > Above testing in progress. okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/63 From kvn at openjdk.java.net Tue Dec 21 21:31:22 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 21 Dec 2021 21:31:22 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction [v2] In-Reply-To: References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 19:57:56 GMT, Smita Kamath wrote: >> Adding vzeroupper instruction to aes and shift intrinsics. > > Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' > - Adding vzeroupper instruction to aes and shift intrinsics Testing passed. Good to integrate. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/52 From jwilhelm at openjdk.java.net Tue Dec 21 22:02:03 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 21 Dec 2021 22:02:03 GMT Subject: Integrated: Merge jdk18 In-Reply-To: <2qAXnV7AsRJ0Z5Of9gkLOHIACVFGRfrl3MVhJsfpSew=.b02d2da2-3ce5-497d-a979-4e602c3834be@github.com> References: <2qAXnV7AsRJ0Z5Of9gkLOHIACVFGRfrl3MVhJsfpSew=.b02d2da2-3ce5-497d-a979-4e602c3834be@github.com> Message-ID: On Tue, 21 Dec 2021 19:50:51 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: 803cb8a7 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/803cb8a76827a21fcf9e033b4ca6a777c509169b Stats: 1086 lines in 80 files changed: 695 ins; 107 del; 284 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6911 From jwilhelm at openjdk.java.net Tue Dec 21 22:02:00 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 21 Dec 2021 22:02:00 GMT Subject: RFR: Merge jdk18 [v2] In-Reply-To: <2qAXnV7AsRJ0Z5Of9gkLOHIACVFGRfrl3MVhJsfpSew=.b02d2da2-3ce5-497d-a979-4e602c3834be@github.com> References: <2qAXnV7AsRJ0Z5Of9gkLOHIACVFGRfrl3MVhJsfpSew=.b02d2da2-3ce5-497d-a979-4e602c3834be@github.com> Message-ID: <06_kDiLlkp6QNtnHGLkuV7TxzucQtDwRaGLnjWX0dpM=.a14aed52-214c-4723-be0c-6bf07ef3a6e0@github.com> > Forwardport JDK 18 -> JDK 19 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 96 commits: - Merge - 8279043: Some Security Exception Messages Miss Spaces Reviewed-by: weijun - 8278793: Interpreter(x64) intrinsify Thread.currentThread() Reviewed-by: rkennke, dcubed, dholmes - 8278044: ObjectInputStream methods invoking the OIF.CFG.getSerialFilterFactory() silent about error cases. Reviewed-by: lancea, bpb - 8278087: Deserialization filter and filter factory property error reporting under specified Reviewed-by: lancea, bpb - 8278917: Use Prev Bitmap for recording evac failed objects Reviewed-by: ayang, mli, tschatzl - 8277893: Arraycopy stress tests Reviewed-by: kvn, mli - 8278893: Parallel: Remove GCWorkerDelayMillis Reviewed-by: ayang, mli - 8278953: Clarify Class.getDeclaredConstructor specification Reviewed-by: mchung, alanb - 8277100: Dynamic dump can inadvertently overwrite default CDS archive Reviewed-by: iklam, minqi, dholmes - ... and 86 more: https://git.openjdk.java.net/jdk/compare/1128674d...d5d5bad9 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6911/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6911&range=01 Stats: 18631 lines in 509 files changed: 14435 ins; 3002 del; 1194 mod Patch: https://git.openjdk.java.net/jdk/pull/6911.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6911/head:pull/6911 PR: https://git.openjdk.java.net/jdk/pull/6911 From svkamath at openjdk.java.net Tue Dec 21 22:04:12 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Dec 2021 22:04:12 GMT Subject: [jdk18] RFR: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 19:48:07 GMT, Vladimir Kozlov wrote: >> Adding vzeroupper instruction to aes and shift intrinsics. > > Please, update to latest changes in JDK18 (there was changes pushed to stubs #19). > I started testing. @vnkozlov Thank you. ------------- PR: https://git.openjdk.java.net/jdk18/pull/52 From svkamath at openjdk.java.net Tue Dec 21 22:13:16 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Dec 2021 22:13:16 GMT Subject: [jdk18] Integrated: 8279045: Intrinsics missing vzeroupper instruction In-Reply-To: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> References: <27V2F0jiu0-jjxsrjhA4U0idx3yY0d2zeMaN2_TDFcY=.92b56e97-d251-4f70-b3d2-a290913a8d59@github.com> Message-ID: On Tue, 21 Dec 2021 00:51:51 GMT, Smita Kamath wrote: > Adding vzeroupper instruction to aes and shift intrinsics. This pull request has now been integrated. Changeset: 9ee3ccfe Author: Smita Kamath Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk18/commit/9ee3ccfee2c9cc54ac7dca49fbf35135e627ef18 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8279045: Intrinsics missing vzeroupper instruction Reviewed-by: neliasso, sviswanathan, kvn ------------- PR: https://git.openjdk.java.net/jdk18/pull/52 From sspitsyn at openjdk.java.net Tue Dec 21 22:24:20 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 21 Dec 2021 22:24:20 GMT Subject: RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 22:59:39 GMT, Coleen Phillimore wrote: > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. Coleen, Thank you for fixing this issue! I've posted one question. Other than that it looks good to me. Thanks, Serguei src/hotspot/share/code/codeCache.cpp line 681: > 679: void CodeCache::metadata_do(MetadataClosure* f) { > 680: assert_locked_or_safepoint(CodeCache_lock); > 681: NMethodIterator iter(NMethodIterator::only_alive); The `CodeCache::metadata_do` is used in `MetadataOnStackMark::MetadataOnStackMark`. Besides `RedefineClasses` the `MetadataOnStackMark` is also used in `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`. Is the change at L681 correct for `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From coleenp at openjdk.java.net Tue Dec 21 22:44:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Dec 2021 22:44:19 GMT Subject: RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: <9Jpdfu1rIBVmxkObMQ84mszhBr9VBkKmK5Dk3pdXy4Y=.dda0e4c7-fae4-4b1a-ad52-2ee0030dea1f@github.com> On Tue, 21 Dec 2021 22:21:14 GMT, Serguei Spitsyn wrote: >> Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. >> >> The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. > > Coleen, > Thank you for fixing this issue! > I've posted one question. Other than that it looks good to me. > Thanks, > Serguei @sspitsyn Thanks for looking at this. Can you review the JDK 18 PR instead? > src/hotspot/share/code/codeCache.cpp line 681: > >> 679: void CodeCache::metadata_do(MetadataClosure* f) { >> 680: assert_locked_or_safepoint(CodeCache_lock); >> 681: NMethodIterator iter(NMethodIterator::only_alive); > > The `CodeCache::metadata_do` is used in `MetadataOnStackMark::MetadataOnStackMark`. > Besides `RedefineClasses` the `MetadataOnStackMark` is also used in `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`. > Is the change at L681 correct for `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`as well? Yes, in ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces, it calls the version of MetadataOnStackMark that doesn't call CodeCache::metadata_do - it only walks the metadata that was saved from the full redefinition walk: caller: MetadataOnStackMark md_on_stack(walk_all_metadata, /*redefinition_walk*/false); callee: if (redefinition_walk) { // We have to walk the whole code cache during redefinition. CodeCache::metadata_do(&md_on_stack); } else { CodeCache::old_nmethods_do(&md_on_stack); } During redefinition, we have to walk the unloaded nmethods as well as the alive nmethods. ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From coleenp at openjdk.java.net Tue Dec 21 22:44:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Dec 2021 22:44:19 GMT Subject: RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: <9Jpdfu1rIBVmxkObMQ84mszhBr9VBkKmK5Dk3pdXy4Y=.dda0e4c7-fae4-4b1a-ad52-2ee0030dea1f@github.com> References: <9Jpdfu1rIBVmxkObMQ84mszhBr9VBkKmK5Dk3pdXy4Y=.dda0e4c7-fae4-4b1a-ad52-2ee0030dea1f@github.com> Message-ID: <1aqWr3qToeOzPVDssRbOnIjRpPYhSXIXkXRaA0qu4fg=.149ca8cd-5ae3-4bf2-a447-dd22ec4e20eb@github.com> On Tue, 21 Dec 2021 22:40:17 GMT, Coleen Phillimore wrote: >> src/hotspot/share/code/codeCache.cpp line 681: >> >>> 679: void CodeCache::metadata_do(MetadataClosure* f) { >>> 680: assert_locked_or_safepoint(CodeCache_lock); >>> 681: NMethodIterator iter(NMethodIterator::only_alive); >> >> The `CodeCache::metadata_do` is used in `MetadataOnStackMark::MetadataOnStackMark`. >> Besides `RedefineClasses` the `MetadataOnStackMark` is also used in `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`. >> Is the change at L681 correct for `ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces`as well? > > Yes, in ClassLoaderDataGraph::walk_metadata_and_clean_metaspaces, it calls the version of MetadataOnStackMark that doesn't call CodeCache::metadata_do - it only walks the metadata that was saved from the full redefinition walk: > > caller: > MetadataOnStackMark md_on_stack(walk_all_metadata, /*redefinition_walk*/false); > > callee: > if (redefinition_walk) { > // We have to walk the whole code cache during redefinition. > CodeCache::metadata_do(&md_on_stack); > } else { > CodeCache::old_nmethods_do(&md_on_stack); > } > > During redefinition, we have to walk the unloaded nmethods as well as the alive nmethods. The nmethods in the table are removed when the unloaded nmethods are flushed so that the table doesn't contain stale entries. ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From sspitsyn at openjdk.java.net Tue Dec 21 22:57:17 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 21 Dec 2021 22:57:17 GMT Subject: RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 22:59:39 GMT, Coleen Phillimore wrote: > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. You are right about MetadataOnStackMark. Yes, I'll review the 18 PR instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/6900 From sspitsyn at openjdk.java.net Tue Dec 21 23:02:19 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 21 Dec 2021 23:02:19 GMT Subject: [jdk18] RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: On Tue, 21 Dec 2021 20:51:04 GMT, Coleen Phillimore wrote: > This is the fix for https://github.com/openjdk/jdk/pull/6900 retargeted to JDK 18. > > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. > > Above testing in progress. Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk18/pull/63 From mli at openjdk.java.net Wed Dec 22 02:33:37 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 22 Dec 2021 02:33:37 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure [v3] In-Reply-To: References: Message-ID: > The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. > The basic log related to evacuation failed will looks like below based on this patch. > > > [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 > [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into log-evac-failure-region-num - Fix test - Initial commit ------------- Changes: https://git.openjdk.java.net/jdk/pull/6860/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6860&range=02 Stats: 23 lines in 6 files changed: 16 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6860.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6860/head:pull/6860 PR: https://git.openjdk.java.net/jdk/pull/6860 From mli at openjdk.java.net Wed Dec 22 02:33:42 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 22 Dec 2021 02:33:42 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure [v2] In-Reply-To: References: Message-ID: On Mon, 20 Dec 2021 02:38:57 GMT, Hamlin Li wrote: >> The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. >> The basic log related to evacuation failed will looks like below based on this patch. >> >> >> [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 >> [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Can I have another reviewer? Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6860 From serb at openjdk.java.net Wed Dec 22 10:57:37 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Wed, 22 Dec 2021 10:57:37 GMT Subject: RFR: 8279134: Fix Amazon copyright in various files Message-ID: This bug is similar to https://bugs.openjdk.java.net/browse/JDK-8244094 Currently, some of the files in the OpenJDK repo have Amazon copyright notices which are all slightly different and do not conform to Amazons preferred copyright notice which is simply (intentionally without copyright year): "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." @simonis @phohensee ------------- Commit messages: - Initial fix JDK-8279134 Changes: https://git.openjdk.java.net/jdk/pull/6915/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6915&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279134 Stats: 15 lines in 14 files changed: 0 ins; 1 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/6915.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6915/head:pull/6915 PR: https://git.openjdk.java.net/jdk/pull/6915 From mli at openjdk.java.net Wed Dec 22 11:59:19 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 22 Dec 2021 11:59:19 GMT Subject: RFR: 8278282: G1: Log basic statistics of evacuation failure [v3] In-Reply-To: References: Message-ID: On Wed, 22 Dec 2021 02:33:37 GMT, Hamlin Li wrote: >> The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. >> The basic log related to evacuation failed will looks like below based on this patch. >> >> >> [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 >> [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into log-evac-failure-region-num > - Fix test > - Initial commit Can I have another reviewer? Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6860 From mli at openjdk.java.net Wed Dec 22 12:43:17 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 22 Dec 2021 12:43:17 GMT Subject: RFR: 8278282: G1: Log basic statistics for evacuation failure [v3] In-Reply-To: References: Message-ID: On Wed, 22 Dec 2021 02:33:37 GMT, Hamlin Li wrote: >> The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. >> The basic log related to evacuation failed will looks like below based on this patch. >> >> >> [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 >> [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into log-evac-failure-region-num > - Fix test > - Initial commit Can I have another reviewer? Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6860 From jwilhelm at openjdk.java.net Wed Dec 22 16:12:51 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 22 Dec 2021 16:12:51 GMT Subject: RFR: Merge jdk18 Message-ID: <3lthdMVqcFajPqeEx-jq-zEJrqH9TepzZcwKuANCPmA=.fdf1ba32-4e8c-4d40-8568-4467b9e82993@github.com> Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8274315: JFR: One closed state per file or stream - 8271447: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters - 8278987: RunThese24H.java failed with EXCEPTION_ACCESS_VIOLATION in __write_sample_info__ - 8279007: jstatd fails to start because SecurityManager is disabled - 8278508: Enable X86 maskAll instruction pattern for 32 bit JVM. - 8279045: Intrinsics missing vzeroupper instruction The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6918&range=00.0 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6918&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/6918/files Stats: 260 lines in 21 files changed: 170 ins; 55 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/6918.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6918/head:pull/6918 PR: https://git.openjdk.java.net/jdk/pull/6918 From jwilhelm at openjdk.java.net Wed Dec 22 16:51:03 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 22 Dec 2021 16:51:03 GMT Subject: RFR: Merge jdk18 [v2] In-Reply-To: <3lthdMVqcFajPqeEx-jq-zEJrqH9TepzZcwKuANCPmA=.fdf1ba32-4e8c-4d40-8568-4467b9e82993@github.com> References: <3lthdMVqcFajPqeEx-jq-zEJrqH9TepzZcwKuANCPmA=.fdf1ba32-4e8c-4d40-8568-4467b9e82993@github.com> Message-ID: > Forwardport JDK 18 -> JDK 19 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 104 commits: - Merge - 8279063: Consolidate push and push_if_necessary in PreservedMarks Reviewed-by: rkennke, mli, tschatzl - 8279024: Remove javascript references from clhsdb.html Reviewed-by: kevinw, sspitsyn - Merge - 8244670: convert clhsdb "whatis" command from javascript to java Reviewed-by: sspitsyn, kevinw - 8279066: entries.remove(entry) is useless in PKCS12KeyStore Reviewed-by: mullan - Merge - 8279060: Parallel: Remove unused PSVirtualSpace constructors Reviewed-by: mli, sjohanss, tschatzl - 8278396: G1: Initialize the BOT threshold to be region bottom Reviewed-by: tschatzl, sjohanss - 8279043: Some Security Exception Messages Miss Spaces Reviewed-by: weijun - ... and 94 more: https://git.openjdk.java.net/jdk/compare/dfb15c3e...70630b7b ------------- Changes: https://git.openjdk.java.net/jdk/pull/6918/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6918&range=01 Stats: 18775 lines in 519 files changed: 14457 ins; 3098 del; 1220 mod Patch: https://git.openjdk.java.net/jdk/pull/6918.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6918/head:pull/6918 PR: https://git.openjdk.java.net/jdk/pull/6918 From jwilhelm at openjdk.java.net Wed Dec 22 16:51:06 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 22 Dec 2021 16:51:06 GMT Subject: Integrated: Merge jdk18 In-Reply-To: <3lthdMVqcFajPqeEx-jq-zEJrqH9TepzZcwKuANCPmA=.fdf1ba32-4e8c-4d40-8568-4467b9e82993@github.com> References: <3lthdMVqcFajPqeEx-jq-zEJrqH9TepzZcwKuANCPmA=.fdf1ba32-4e8c-4d40-8568-4467b9e82993@github.com> Message-ID: On Wed, 22 Dec 2021 16:03:43 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: f1fbba23 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/f1fbba23ebdb28a32977241f8e85b60e10878cbc Stats: 260 lines in 21 files changed: 170 ins; 55 del; 35 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6918 From coleenp at openjdk.java.net Wed Dec 22 17:09:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 22 Dec 2021 17:09:15 GMT Subject: [jdk18] RFR: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: <4smM3KY3IgoZOuA3IGp03WVkNV9kkPiWEI5ciwJqHfw=.d09ae931-c8c3-4881-8508-4cf80e91bfee@github.com> On Tue, 21 Dec 2021 20:51:04 GMT, Coleen Phillimore wrote: > This is the fix for https://github.com/openjdk/jdk/pull/6900 retargeted to JDK 18. > > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. > > Above testing in progress. Thanks for reviewing, Vladimir and Serguei. ------------- PR: https://git.openjdk.java.net/jdk18/pull/63 From coleenp at openjdk.java.net Wed Dec 22 17:22:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 22 Dec 2021 17:22:19 GMT Subject: [jdk18] Integrated: 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d In-Reply-To: References: Message-ID: On Tue, 21 Dec 2021 20:51:04 GMT, Coleen Phillimore wrote: > This is the fix for https://github.com/openjdk/jdk/pull/6900 retargeted to JDK 18. > > Thanks to @stefank and @fisk for the diagnosis. ZGC is looking at metadata in unloaded nmethods, but redefinition doesn't keep this metadata from being deallocated. Change the iterators that mark metadata in use to walk unloaded nmethods as well as other alive nmethods. > > The test case reproduces the crash on windows if run in an 100 iteration loop. This fix does not crash in the same test. Also ran tier1-6. > > Above testing in progress. This pull request has now been integrated. Changeset: 2be3e7ef Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk18/commit/2be3e7ef1cff1aae6faf1f4f0545d561af48d0ba Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d Reviewed-by: kvn, sspitsyn, eosterlund ------------- PR: https://git.openjdk.java.net/jdk18/pull/63 From mli at openjdk.java.net Thu Dec 23 02:25:18 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 23 Dec 2021 02:25:18 GMT Subject: RFR: 8278282: G1: Log basic statistics for evacuation failure [v3] In-Reply-To: References: Message-ID: <-l7sfJEjl--1H5nvWXHlomkVvbOg8uTOGw7tNRTn7GM=.416c010d-cc5d-41ca-bed6-3048202d317d@github.com> On Wed, 22 Dec 2021 02:33:37 GMT, Hamlin Li wrote: >> The original pr is at #6763 , which should be retired as we have decided to adjust part of optimization solution for evacuation failure (see #6627 for details), so the log will be adjusted accordiingly. >> The basic log related to evacuation failed will looks like below based on this patch. >> >> >> [13.126s][debug][gc,phases] GC(0) Restore Retained Regions (ms): Min: 0.0, Avg: 197.4, Max: 1579.1, Diff: 1579.1, Sum: 1579.1, Workers: 8 >> [13.126s][debug][gc,phases] GC(0) Evacuation Failure Regions: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 1, Workers: 1 > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into log-evac-failure-region-num > - Fix test > - Initial commit Can I have another reviewer? Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6860 From yyang at openjdk.java.net Thu Dec 23 02:42:41 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 23 Dec 2021 02:42:41 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace Message-ID: 8278125: Some preallocated OOMEs are missing stack trace ------------- Commit messages: - Backport ad1dc9c2ae5463363aff20072a3f2ca4ea23acd2 Changes: https://git.openjdk.java.net/jdk18/pull/67/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=67&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278125 Stats: 88 lines in 4 files changed: 85 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk18/pull/67.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/67/head:pull/67 PR: https://git.openjdk.java.net/jdk18/pull/67 From dholmes at openjdk.java.net Thu Dec 23 02:53:11 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 23 Dec 2021 02:53:11 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: References: Message-ID: <4HrfgKz8bcN2muylmH57MjW8T7N6d6Ovdiki3LkbDVY=.b51f477b-7ce5-41bd-ab41-6c7f34ebe008@github.com> On Thu, 23 Dec 2021 02:32:58 GMT, Yi Yang wrote: > 8278125: Some preallocated OOMEs are missing stack trace Sorry this is a P4 bug (more an enhancement really) and as such is not suitable for inclusion in JDK 18 after RDP1 has started. You could target to 18.0.1 instead. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From yyang at openjdk.java.net Thu Dec 23 02:59:14 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 23 Dec 2021 02:59:14 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: <4HrfgKz8bcN2muylmH57MjW8T7N6d6Ovdiki3LkbDVY=.b51f477b-7ce5-41bd-ab41-6c7f34ebe008@github.com> References: <4HrfgKz8bcN2muylmH57MjW8T7N6d6Ovdiki3LkbDVY=.b51f477b-7ce5-41bd-ab41-6c7f34ebe008@github.com> Message-ID: On Thu, 23 Dec 2021 02:50:03 GMT, David Holmes wrote: > Sorry this is a P4 bug (more an enhancement really) and as such is not suitable for inclusion in JDK 18 after RDP1 has started. You could target to 18.0.1 instead. > > Thanks, David Okay, I will redo this when jdk18 is released... ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From yyang at openjdk.java.net Thu Dec 23 02:59:14 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 23 Dec 2021 02:59:14 GMT Subject: [jdk18] Withdrawn: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: References: Message-ID: <0n4JGXMnH9PiP-5jRqX71Hse3Hf23nXf0EdLyxksMxQ=.b061afcd-8797-4cad-ac2c-5cfa9a822ac8@github.com> On Thu, 23 Dec 2021 02:32:58 GMT, Yi Yang wrote: > 8278125: Some preallocated OOMEs are missing stack trace This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From dholmes at openjdk.java.net Thu Dec 23 04:18:11 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 23 Dec 2021 04:18:11 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: References: Message-ID: On Thu, 23 Dec 2021 02:32:58 GMT, Yi Yang wrote: > 8278125: Some preallocated OOMEs are missing stack trace You can do it now against 18u: https://github.com/openjdk/jdk18u ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From ddong at openjdk.java.net Thu Dec 23 08:10:10 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 23 Dec 2021 08:10:10 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: <0A94uQQwqcWWpN-zu5q4Dd1mOOnPnBswvQcBQma5Zqs=.6281a912-8791-490e-9c65-c3d92302a034@github.com> On Tue, 7 Dec 2021 15:37:28 GMT, Andrew Haley wrote: >>> > Thank you for this. I'll have a look. >>> > Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it >>> > It would be nice to get -XX:+PreserveFramePointer working correctly. >>> >>> Thanks for the response. >>> >>> I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: >> >> It's not reserving anything, it's saving the PC for the stack unwinder. >> >>> ``` >>> aarch64.ad >>> >>> aarch64_enc_java_to_runtime >>> >>> Label retaddr; >>> __ adr(rscratch2, retaddr); >>> __ lea(rscratch1, RuntimeAddress(entry)); >>> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >>> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >>> __ blr(rscratch1); >>> __ bind(retaddr); >>> __ add(sp, sp, 2 * wordSize); >>> ``` >> >> I wrote it. If you look at `JavaFrameAnchor::capture_last_Java_pc()` you'll see >> it being used. >> >>> ``` >>> MacroAssembler::call_VM_leaf_base >>> >>> >>> stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); >>> >>> mov(rscratch1, entry_point); >>> blr(rscratch1); >>> if (retaddr) >>> bind(*retaddr); >>> >>> ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); >>> ``` >>> >>> I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. >> >> All this is doing is saving `rmethod` (which is in a call-clobbered register) around a VM call. `retaddr` is saved for OOP maps. > >> Hi @theRealAph , Sorry to disturb you again, I have one more question. >> >> Under the current implementation, if the number of parameters of callee exceeds the number of parameter registers, the parameters on the stack cannot be read correctly, right? >> >> ``` >> aarch64.ad >> >> aarch64_enc_java_to_runtime >> >> Label retaddr; >> __ adr(rscratch2, retaddr); >> __ lea(rscratch1, RuntimeAddress(entry)); >> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ add(sp, sp, 2 * wordSize); >> ``` > > I think that's right, but there are no runtime calls with so many arguments, and I don't think there are likely to be. We should perhaps assert that. Hi @theRealAph , Do you have any comments on this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From duke at openjdk.java.net Thu Dec 23 08:55:47 2021 From: duke at openjdk.java.net (xpbob) Date: Thu, 23 Dec 2021 08:55:47 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v9] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8277930' of https://github.com/xpbob/jdk into JDK-8277930 - add threshold ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/bf3a9c28..4fb24134 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=07-08 Stats: 6 lines in 2 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From aph at openjdk.java.net Thu Dec 23 11:07:15 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Dec 2021 11:07:15 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Tue, 7 Dec 2021 15:37:28 GMT, Andrew Haley wrote: >>> > Thank you for this. I'll have a look. >>> > Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it >>> > It would be nice to get -XX:+PreserveFramePointer working correctly. >>> >>> Thanks for the response. >>> >>> I also noticed that a java method will reserve 2 words when this method makes a vm leaf call: >> >> It's not reserving anything, it's saving the PC for the stack unwinder. >> >>> ``` >>> aarch64.ad >>> >>> aarch64_enc_java_to_runtime >>> >>> Label retaddr; >>> __ adr(rscratch2, retaddr); >>> __ lea(rscratch1, RuntimeAddress(entry)); >>> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >>> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >>> __ blr(rscratch1); >>> __ bind(retaddr); >>> __ add(sp, sp, 2 * wordSize); >>> ``` >> >> I wrote it. If you look at `JavaFrameAnchor::capture_last_Java_pc()` you'll see >> it being used. >> >>> ``` >>> MacroAssembler::call_VM_leaf_base >>> >>> >>> stp(rscratch1, rmethod, Address(pre(sp, -2 * wordSize))); >>> >>> mov(rscratch1, entry_point); >>> blr(rscratch1); >>> if (retaddr) >>> bind(*retaddr); >>> >>> ldp(rscratch1, rmethod, Address(post(sp, 2 * wordSize))); >>> ``` >>> >>> I haven't figured out the specific purpose of this operation, but I think it will make the logic of stack walking more complicated. >> >> All this is doing is saving `rmethod` (which is in a call-clobbered register) around a VM call. `retaddr` is saved for OOP maps. > >> Hi @theRealAph , Sorry to disturb you again, I have one more question. >> >> Under the current implementation, if the number of parameters of callee exceeds the number of parameter registers, the parameters on the stack cannot be read correctly, right? >> >> ``` >> aarch64.ad >> >> aarch64_enc_java_to_runtime >> >> Label retaddr; >> __ adr(rscratch2, retaddr); >> __ lea(rscratch1, RuntimeAddress(entry)); >> // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ add(sp, sp, 2 * wordSize); >> ``` > > I think that's right, but there are no runtime calls with so many arguments, and I don't think there are likely to be. We should perhaps assert that. > Hi @theRealAph , > > Do you have any comments on this patch? Hmm, it's a tricky one. Your solution might be the best that can be done at present, but it doesn't make me feel very comfortable. I think I need to have a look at it later, probably in the new year. Please feel free to remind me then. ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From coleenp at openjdk.java.net Thu Dec 23 15:15:14 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Dec 2021 15:15:14 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: References: Message-ID: On Thu, 23 Dec 2021 02:32:58 GMT, Yi Yang wrote: > 8278125: Some preallocated OOMEs are missing stack trace I don't see why it should be done in 18u? The priority wasn't that high. It should be done on JDK 19 and then backported to JDK 17 (LTS release) and 11 makes more sense to me. ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From boris.ulasevich at bell-sw.com Thu Dec 23 15:58:25 2021 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 23 Dec 2021 18:58:25 +0300 Subject: RFC: improving NMethod code locality in CodeCache In-Reply-To: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> Message-ID: <7b2a8221-ef48-2287-3c49-b6b7f3d10116@bell-sw.com> Hi Evgeny, Thank you for sharing the data. It is very detailed and well structured. It is indeed interesting that the code itself takes ~1/2 of the volume and sometimes even less. So judging from the numbers, we can (theoretically) double the code dencity. I agree that it is worth doing. You say [1] that branch prediction hardware can become overloaded in the case of 15K compiled methods. In your numbers, I see the maxium is 7K methods ~ 50MB (on Renaissance benchmark). This is quite a load, yes. Also on aws-graviton-getting-started link [2] we see that the recommended CodeCacheSize value is 64M - more than that makes a performance impact. These cases may be also different by the contents of the code cache: I guess it's tiered compilation in benchmarks and non-tiered C2 in [2]. My questions are - What is the typical CodeCache size for real-world applications? Is it common for CodeCache get hundreds of megabytes? Can it be simulated with benchmarks? - I am not sure that branch predictors are often limited to a certain amount of memory, which is much less than the possible size of the code. There are now 3 generations of AWS Graviton HW. Do you observe same branch prediction and code cache size effects on all three? - What does maximum CodeCache limit mean, is this distance from the first method to the last? Will it help if C2 put the metatadata and things to the next page after the instructions page? I mean it worth putting them not too far from each other. Besides code density issue in case of a limited CodeCache size (either a small amount of memory or a limitation of branch predictor) I believe it makes sence to work with Sweaper so that it removes cold methods actively from the CodeCache (see the Hotness Code picture on Page 65, [3]). After the virtual machine warms up, the compiler threads are idle anyway. In general a GC-like approach can be applied to the CodeCache to make it clean, small and hot. thanks, Boris [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-November/056198.html [2] https://github.com/aws/aws-graviton-getting-started/blob/main/java.md [3] http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf 11/23/2021 8:34 PM, Astigeevich, Evgeny ?????: > Hello, > > We?d like to discuss a proposal for improving NMethod code locality in CodeCache. > > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. > > The current NMethod layout is continuous and consists of the following sections: > * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ?sizeof(NMethod)?. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes. > * Relocation > * Constant pool > * Instructions (main code) > * Stub code > * Oops > * Metadata: Class related metadata > * Scopes data: Debugging information > * Scopes pcs: Debugging information > * Dependencies > * Handler table: Exception handler table > * Nul chk table: Implicit Null Pointer exception table > * Speculations > * JVMCI data > > We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ?XX:+LogCompilation?. > Summary of results for jdk17 with tiered compilation: > * DaCapo: > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 152 | 5215 | 916 | > | Total size - bytes | 271,576 | 38,367,872 | 4,072,616 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.7% | 19.3% | 8.0% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 39.7% | 49.7% | 44.5% | > | stub code | 8.9% | 11.3% | 10.1% | > | oops | 0.2% | 0.4% | 0.3% | > | metadata | 2.0% | 3.0% | 2.3% | > | scopes data | 12.2% | 18.6% | 15.9% | > | scopes pcs | 7.8% | 9.0% | 8.4% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.3% | 3.3% | 2.1% | > | nul_chk table | 1.0% | 1.6% | 1.6% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 5135 | 889 | > | Total size - bytes | 264,800 | 35,026,312 | 3,985,744 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.2% | 20.6% | 8.3% | > | consts | 0.0% | 0.6% | 0.1% | > | instrs | 49.2% | 60.7% | 55.3% | > | stub code | 1.1% | 1.9% | 1.4% | > | oops | 0.1% | 0.3% | 0.2% | > | metadata | 1.6% | 2.9% | 2.0% | > | scopes data | 12.2% | 19.6% | 16.8% | > | scopes pcs | 7.8% | 9.2% | 8.5% | > | deps | 0.3% | 0.8% | 0.5% | > | handler table | 1.5% | 3.5% | 2.0% | > | nul_chk table | 0.9% | 1.6% | 1.1% | > +---------------+-------+-------+--------+ > > * Renaissance > * arm64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv): > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 155 | 7447 | 1198 | > | Total size - bytes | 366,248 | 52,840,528 | 4,989,392 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 4.8% | 14.6% | 8.5% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 35.7% | 45.6% | 42.8% | > | stub code | 8.3% | 12.0% | 10.1% | > | oops | 0.2% | 0.6% | 0.4% | > | metadata | 2.0% | 4.1% | 3.0% | > | scopes data | 12.4% | 20.8% | 16.1% | > | scopes pcs | 7.8% | 8.9% | 8.4% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.2% | 3.9% | 2.4% | > | nul_chk table | 0.9% | 1.3% | 1.1% | > +---------------+-------+-------+--------+ > > * x86_64 (full datahttps://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv): > > +---------------------+---------+------------+-----------+ > | | min | max | median | > +---------------------+---------+------------+-----------+ > | C2 nmethods | 158 | 7242 | 938 | > | Total size - bytes | 354,952 | 47,019,560 | 3,791,764 | > +---------------------+---------+------------+-----------+ > > Proportion of the total size of a section vs C2 nmethods total size > > +---------------+-------+-------+--------+ > | Section | min | max | median | > +---------------+-------+-------+--------+ > | header | 5.4% | 15.7% | 9.7% | > | consts | 0.0% | 0.1% | 0.0% | > | instrs | 46.1% | 54.4% | 52.7% | > | stub code | 1.3% | 1.9% | 1.4% | > | oops | 0.2% | 0.5% | 0.3% | > | metadata | 1.9% | 3.4% | 2.6% | > | scopes data | 12.7% | 23.6% | 17.4% | > | scopes pcs | 8.0% | 9.4% | 8.6% | > | deps | 0.4% | 1.0% | 0.5% | > | handler table | 1.3% | 4.0% | 2.5% | > | nul_chk table | 1.0% | 1.4% | 1.2% | > +---------------+-------+-------+--------+ > > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it:https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. > > There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. > > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) > > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. > > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. > > Comments welcome! > > Thanks, > Evgeny Astigeevich, AWS Corretto Team > > > > > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. > > From duke at openjdk.java.net Thu Dec 23 16:05:31 2021 From: duke at openjdk.java.net (Quan Anh Mai) Date: Thu, 23 Dec 2021 16:05:31 GMT Subject: RFR: 8279143: Undefined behaviours in globalDefinitions.hpp Message-ID: Hi, This patch replaces undefined behaviours in globalDefinitions.hpp by proper well-defined ones. Thank you very much. ------------- Commit messages: - clean - Merge branch 'master' into undefinedBehaviour - Merge branch 'master' of github.com:MeryKitty/jdk into undefinedBehaviour - implementation limits - const reference - words not need to be initialized - undefined behaviour in globalDefinitions.hpp Changes: https://git.openjdk.java.net/jdk/pull/6930/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6930&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279143 Stats: 44 lines in 4 files changed: 14 ins; 9 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6930.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6930/head:pull/6930 PR: https://git.openjdk.java.net/jdk/pull/6930 From duke at openjdk.java.net Thu Dec 23 16:33:49 2021 From: duke at openjdk.java.net (Quan Anh Mai) Date: Thu, 23 Dec 2021 16:33:49 GMT Subject: RFR: 8279143: Undefined behaviours in globalDefinitions.hpp [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch replaces undefined behaviours in globalDefinitions.hpp by proper well-defined ones. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6930/files - new: https://git.openjdk.java.net/jdk/pull/6930/files/3960982e..c84bc84f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6930&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6930&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6930.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6930/head:pull/6930 PR: https://git.openjdk.java.net/jdk/pull/6930 From coleenp at openjdk.java.net Thu Dec 23 17:07:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Dec 2021 17:07:15 GMT Subject: [jdk18] RFR: 8278125: Some preallocated OOMEs are missing stack trace In-Reply-To: References: Message-ID: On Thu, 23 Dec 2021 02:32:58 GMT, Yi Yang wrote: > 8278125: Some preallocated OOMEs are missing stack trace At any rate, the initial fix should be made to JDK 19 and then backported to 18u, 17 and 11, if approved. The code is going to be different in JDK 11, need to add the strings to the Universe::oops_do function. ------------- PR: https://git.openjdk.java.net/jdk18/pull/67 From jwilhelm at openjdk.java.net Thu Dec 23 17:19:53 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 23 Dec 2021 17:19:53 GMT Subject: RFR: Merge jdk18 Message-ID: <6P4qa0-ey3ccmAWsQvfIev6IB087sfu4lpYtowgR1gE=.1daef932-9673-4d89-98b2-81a35c04e36f@github.com> Forwardport JDK 18 -> JDK 19 ------------- Commit messages: - Merge - 8279204: [BACKOUT] JDK-8278413: C2 crash when allocating array of size too large - 8268297: jdk/jfr/api/consumer/streaming/TestLatestEvent.java times out - 8279076: C2: Bad AD file when matching SqrtF with UseSSE=0 - 8278967: rmiregistry fails to start because SecurityManager is disabled - 8278239: vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine failed with EXCEPTION_ACCESS_VIOLATION at 0x000000000000000d The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6931&range=00.0 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6931&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/6931/files Stats: 299 lines in 17 files changed: 141 ins; 126 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/6931.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6931/head:pull/6931 PR: https://git.openjdk.java.net/jdk/pull/6931 From psandoz at openjdk.java.net Thu Dec 23 18:34:16 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 23 Dec 2021 18:34:16 GMT Subject: RFR: 8277155: Compress and expand vector operations In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 19:20:08 GMT, Paul Sandoz wrote: > Add two new cross-lane vector operations, `compress` and `expand`. > > An example of such usage might be code that selects elements from array `a` and stores those selected elements in array `z`: > > > int[] a = ...; > > int[] z = ...; > int ai = 0, zi = 0; > while (ai < a.length) { > IntVector av = IntVector.fromArray(SPECIES, a, ai); > // query over elements of vector av > // returning a mask marking elements of interest > VectorMask m = interestingBits(av, ...); > IntVector zv = av.compress(m); > zv.intoArray(z, zi, m.compress()); > ai += SPECIES.length(); > zi += m.trueCount(); > } > > > (There's also a more sophisticated version using `unslice` to coalesce matching elements with non-masked stores.) > > Given RDP 1 for 18 is getting close, 2021/12/09, we may not get this reviewed in time and included in [JEP 417](https://openjdk.java.net/jeps/417). Still I think I think it worth starting the review now (the CSR is marked provisional). keep-alive ------------- PR: https://git.openjdk.java.net/jdk/pull/6545 From jwilhelm at openjdk.java.net Thu Dec 23 21:22:01 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 23 Dec 2021 21:22:01 GMT Subject: RFR: Merge jdk18 [v2] In-Reply-To: <6P4qa0-ey3ccmAWsQvfIev6IB087sfu4lpYtowgR1gE=.1daef932-9673-4d89-98b2-81a35c04e36f@github.com> References: <6P4qa0-ey3ccmAWsQvfIev6IB087sfu4lpYtowgR1gE=.1daef932-9673-4d89-98b2-81a35c04e36f@github.com> Message-ID: > Forwardport JDK 18 -> JDK 19 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 113 commits: - Merge - 8279115: Fix internal doc comment errors. Reviewed-by: mli - 8276302: Locale.filterTags methods ignore actual weight when matching "*" (as if it is 1) Reviewed-by: naoto - 8278766: Enable OpenJDK build support for reproducible jars and jmods using --date Reviewed-by: erikj - 8278125: Some preallocated OOMEs are missing stack trace Co-authored-by: dongyun.tdy Reviewed-by: dholmes, coleenp - 8244669: convert clhsdb "mem" command from javascript to java Reviewed-by: sspitsyn, kevinw, poonam - 8209398: sun/security/pkcs11/KeyStore/SecretKeysBasic.sh failed with "PKCS11Exception: CKR_ATTRIBUTE_SENSITIVE" Reviewed-by: hchao, weijun - Merge - 8279022: JCmdTestFileSafety.java should check file time stamp for test result Reviewed-by: ccheung - 8279018: CRC calculation in CDS should not include _version and _head_size Reviewed-by: iklam, ccheung - ... and 103 more: https://git.openjdk.java.net/jdk/compare/04ad6689...a3fcfa2b ------------- Changes: https://git.openjdk.java.net/jdk/pull/6931/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6931&range=01 Stats: 19323 lines in 545 files changed: 14840 ins; 3205 del; 1278 mod Patch: https://git.openjdk.java.net/jdk/pull/6931.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6931/head:pull/6931 PR: https://git.openjdk.java.net/jdk/pull/6931 From jwilhelm at openjdk.java.net Thu Dec 23 21:22:04 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 23 Dec 2021 21:22:04 GMT Subject: Integrated: Merge jdk18 In-Reply-To: <6P4qa0-ey3ccmAWsQvfIev6IB087sfu4lpYtowgR1gE=.1daef932-9673-4d89-98b2-81a35c04e36f@github.com> References: <6P4qa0-ey3ccmAWsQvfIev6IB087sfu4lpYtowgR1gE=.1daef932-9673-4d89-98b2-81a35c04e36f@github.com> Message-ID: On Thu, 23 Dec 2021 17:11:15 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: a3b1c6b0 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/a3b1c6b03600da21b00a1f37ea4712096d636b14 Stats: 299 lines in 17 files changed: 141 ins; 126 del; 32 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/6931 From xliu at openjdk.java.net Thu Dec 23 21:40:16 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 23 Dec 2021 21:40:16 GMT Subject: RFR: 8279134: Fix Amazon copyright in various files In-Reply-To: References: Message-ID: On Wed, 22 Dec 2021 09:07:24 GMT, Sergey Bylokhov wrote: > This bug is similar to https://bugs.openjdk.java.net/browse/JDK-8244094 > > Currently, some of the files in the OpenJDK repo have Amazon copyright notices which are all slightly different and do not conform to Amazons preferred copyright notice which is simply (intentionally without copyright year): > > "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." > > @simonis @phohensee LGTM. I am not a reviewer. Need other reviewers to approve it. ------------- Marked as reviewed by xliu (Committer). PR: https://git.openjdk.java.net/jdk/pull/6915 From phh at openjdk.java.net Thu Dec 23 22:47:11 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Thu, 23 Dec 2021 22:47:11 GMT Subject: RFR: 8279134: Fix Amazon copyright in various files In-Reply-To: References: Message-ID: <7ozNgn5YKtgY8XsZYqexriYLPTVz-yZJJcEs2cxSfuQ=.d731d200-f8ef-4999-9e14-f6be115bdf14@github.com> On Wed, 22 Dec 2021 09:07:24 GMT, Sergey Bylokhov wrote: > This bug is similar to https://bugs.openjdk.java.net/browse/JDK-8244094 > > Currently, some of the files in the OpenJDK repo have Amazon copyright notices which are all slightly different and do not conform to Amazons preferred copyright notice which is simply (intentionally without copyright year): > > "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." > > @simonis @phohensee Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6915 From serb at openjdk.java.net Fri Dec 24 00:50:57 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Fri, 24 Dec 2021 00:50:57 GMT Subject: RFR: 8279134: Fix Amazon copyright in various files [v2] In-Reply-To: References: Message-ID: <4TylKnNJehPUtdh9y0wh4Cgq93jq4y4mEQfHEKBJdmA=.907bb3ab-9b11-40da-8957-dfef5f3e21e4@github.com> > This bug is similar to https://bugs.openjdk.java.net/browse/JDK-8244094 > > Currently, some of the files in the OpenJDK repo have Amazon copyright notices which are all slightly different and do not conform to Amazons preferred copyright notice which is simply (intentionally without copyright year): > > "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." > > @simonis @phohensee Sergey Bylokhov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8279134 - Initial fix JDK-8279134 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6915/files - new: https://git.openjdk.java.net/jdk/pull/6915/files/cb05f5bb..52c5d9c3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6915&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6915&range=00-01 Stats: 619 lines in 42 files changed: 446 ins; 100 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/6915.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6915/head:pull/6915 PR: https://git.openjdk.java.net/jdk/pull/6915 From kvn at openjdk.java.net Fri Dec 24 01:41:30 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 24 Dec 2021 01:41:30 GMT Subject: [jdk18] RFR: 8279195: Document the -XX:+NeverActAsServerClassMachine flag Message-ID: This product flag provides a way to emulate the old "Client VM" but has never been documented in the java command reference (aka manpage). It should be documented. ------------- Commit messages: - 8279195: Document the -XX:+NeverActAsServerClassMachine flag Changes: https://git.openjdk.java.net/jdk18/pull/71/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=71&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279195 Stats: 25 lines in 1 file changed: 25 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk18/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk18/pull/71 From dholmes at openjdk.java.net Fri Dec 24 02:15:20 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 24 Dec 2021 02:15:20 GMT Subject: [jdk18] RFR: 8279195: Document the -XX:+NeverActAsServerClassMachine flag In-Reply-To: References: Message-ID: On Fri, 24 Dec 2021 01:34:42 GMT, Vladimir Kozlov wrote: > This product flag provides a way to emulate the old "Client VM" but has never been documented in the java command reference (aka manpage). It should be documented. Looks good! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/71 From duke at openjdk.java.net Fri Dec 24 07:12:03 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Fri, 24 Dec 2021 07:12:03 GMT Subject: RFR: 8278868: Add x86 vectorization support for Long.bitCount() [v3] In-Reply-To: References: Message-ID: > Vectorization support of Integer.bitCount() already exists but currently the same support is lacking for Long.bitCount(). Similar to the C2 PopCountVI node, we created a C2 PopCountVL node and used vpopcntq x86 instruction to enable vectorized Long.bitCount(). This patch shows 2.57x improvement in performance on a JMH micro benchmark due to x86 vectorization. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Use generic vector node names ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6857/files - new: https://git.openjdk.java.net/jdk/pull/6857/files/4567eab8..67f2a71b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6857&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6857&range=01-02 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6857.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6857/head:pull/6857 PR: https://git.openjdk.java.net/jdk/pull/6857 From kvn at openjdk.java.net Fri Dec 24 22:37:13 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 24 Dec 2021 22:37:13 GMT Subject: [jdk18] Integrated: 8279195: Document the -XX:+NeverActAsServerClassMachine flag In-Reply-To: References: Message-ID: On Fri, 24 Dec 2021 01:34:42 GMT, Vladimir Kozlov wrote: > This product flag provides a way to emulate the old "Client VM" but has never been documented in the java command reference (aka manpage). It should be documented. This pull request has now been integrated. Changeset: 2945b786 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk18/commit/2945b786ba6b60cc33153bb1d40ac7a0918dadbe Stats: 25 lines in 1 file changed: 25 ins; 0 del; 0 mod 8279195: Document the -XX:+NeverActAsServerClassMachine flag Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk18/pull/71 From hohensee at amazon.com Sun Dec 26 18:58:10 2021 From: hohensee at amazon.com (Hohensee, Paul) Date: Sun, 26 Dec 2021 18:58:10 +0000 Subject: RFC: improving NMethod code locality in CodeCache Message-ID: <770FAF9C-8B73-4A62-9456-7C7C23EF5753@amazon.com> I've filed a draft "umbrella" JEP that mostly discusses periodically compacting the code cache, but mentions this work as a preliminary project. See https://bugs.openjdk.java.net/browse/JDK-8279184: Instruction Issue Cache Hardware Accommodation Thanks, Paul ?-----Original Message----- From: hotspot-dev on behalf of "Astigeevich, Evgeny" Date: Thursday, December 2, 2021 at 2:08 PM To: "Schmidt, Lutz" , "hotspot-dev at openjdk.java.net" Cc: Tobias Hartmann Subject: Re: RFC: improving NMethod code locality in CodeCache Hi Lutz, Thank you for your comments. From the data I've got NMethod constants section does not take a lot of space: total of them is below 0.6%. This is similar for both x86_64 and arm64. I guess it should be similar for other architectures. A decision to move stub code out depends on the stub code contribution. For x86_64 it is below 2%. So it might be kept with the main code. On arm64 it is currently up to 12%. I found couple issues in generated stub code. Resolving them should reduce arm64 stub code size. I am not sure it would possible to get arm64 stub code below 2%. > When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. All CPUs I have worked with don't like self modifying code and mixing code with modifiable data. An exception is literal pools holding constants embedded into code. > Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. In "A Survey of Techniques for Dynamic Branch Prediction"(https://arxiv.org/pdf/1804.00261.pdf) I found that a distance between branches can be taken into account. I've seen this. Thanks, Evgeny On 29/11/2021, 12:20, "Schmidt, Lutz" wrote: Hi, a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best. - The biggest concern I have relates to pc-relative addressing. o nmethod constants are currently located next to the instruction section. Putting them into a separately allocated area may break the pc-relative limit. s390x limit: +/- 4GB, no fallback implemented. o relative branches either are + short distance, mostly intra-nmethod + long distance, mostly inter-nmethod + not possible in general, e.g., runtime calls The branch optimization (in shorten_branches) might less often be possible. One example would be if stub code is moved to a separately allocated area. - When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. s390x: never modify data in a cache line where instructions are fetched from. That will kill your performance big time. - I'm not a branch prediction expert. Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. Thanks, Lutz On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" wrote: Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&reserved=0) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&reserved=0), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&reserved=0 https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&reserved=0 > There is JDK-7072317 ?move metadata from CodeCache? (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&reserved=0) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From serb at openjdk.java.net Sun Dec 26 22:14:15 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Sun, 26 Dec 2021 22:14:15 GMT Subject: Integrated: 8279134: Fix Amazon copyright in various files In-Reply-To: References: Message-ID: On Wed, 22 Dec 2021 09:07:24 GMT, Sergey Bylokhov wrote: > This bug is similar to https://bugs.openjdk.java.net/browse/JDK-8244094 > > Currently, some of the files in the OpenJDK repo have Amazon copyright notices which are all slightly different and do not conform to Amazons preferred copyright notice which is simply (intentionally without copyright year): > > "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved." > > @simonis @phohensee This pull request has now been integrated. Changeset: 7fea1032 Author: Sergey Bylokhov URL: https://git.openjdk.java.net/jdk/commit/7fea10327ed27fcf8eae474ca5b15c3b4bafff2a Stats: 15 lines in 14 files changed: 0 ins; 1 del; 14 mod 8279134: Fix Amazon copyright in various files Reviewed-by: xliu, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/6915